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Preface 



This book is intended as a text covering the central concepts of practical optimiza- 
tion techniques. It is designed for either self-study by professionals or classroom 
work at the undergraduate or graduate level for students who have a technical back- 
ground in engineering, mathematics, or science. Like the field of optimization itself, 
which involves many classical disciplines, the book should be useful to system ana- 
lysts, operations researchers, numerical analysts, management scientists, and other 
specialists from the host of disciplines from which practical optimization appli- 
cations are drawn. The prerequisites for convenient use of the book are relatively 
modest; the prime requirement being some familiarity with introductory elements 
of linear algebra. Certain sections and developments do assume some knowledge 
of more advanced concepts of linear algebra, such as eigenvector analysis, or some 
background in sets of real numbers, but the text is structured so that the mainstream 
of the development can be faithfully pursued without reliance on this more advanced 
background material. 

Although the book covers primarily material that is now fairly standard, this edi- 
tion emphasizes methods that are both state-of-the-art and popular. One major in- 
sight is the connection between the purely analytical character of an optimization 
problem, expressed perhaps by properties of the necessary conditions, and the be- 
havior of algorithms used to solve a problem. This was a major theme of the first 
edition of this book and the fourth edition expands and further illustrates this rela- 
tionship. 

As in the earlier editions, the material in this fourth edition is organized into three 
separate parts. Part I is a self-contained introduction to linear programming, a key 
component of optimization theory. The presentation in this part is fairly conven- 
tional, covering the main elements of the underlying theory of linear programming, 
many of the most effective numerical algorithms, and many of its important special 
applications. Part II, which is independent of Part I, covers the theory of uncon- 
strained optimization, including both derivations of the appropriate optimality con- 
ditions and an introduction to basic algorithms. This part of the book explores the 
general properties of algorithms and defines various notions of convergence. Part III 
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extends the concepts developed in the second part to constrained optimization 
problems. Except for a few isolated sections, this part is also independent of Part I. 
It is possible to go directly into Parts II and III omitting Part I, and, in fact, the 
book has been used in this way in many universities. Each part of the book contains 
enough material to form the basis of a one-quarter course. In either classroom use 
or for self-study, it is important not to overlook the suggested exercises at the end of 
each chapter. The selections generally include exercises of a computational variety 
designed to test one’s understanding of a particular algorithm, a theoretical variety 
designed to test one’s understanding of a given theoretical development, or of the 
variety that extends the presentation of the chapter to new applications or theoretical 
areas. One should attempt at least four or five exercises from each chapter. In pro- 
gressing through the book it would be unusual to read straight through from cover 
to cover. Generally, one will wish to skip around. In order to facilitate this mode, we 
have indicated sections of a specialized or digressive nature with an asterisk*. 

New to this edition is a special Chap. 6 devoted to Conic Linear Programming, a 
powerful generalization of Linear Programming. While the constraint set in a nor- 
mal linear program is defined by a finite number of linear inequalities of finite- 
dimensional vector variables, the constraint set in conic linear programming may be 
defined, for example, as a linear combination of symmetric positive semi-definite 
matrices of a given dimension. Indeed, many conic structures are possible and use- 
ful in a variety of applications. It must be recognized, however, that conic linear 
programming is an advanced topic, requiring special study. 

Another important topic is an accelerated steepest descent method that exhibits 
superior convergence properties, and for this reason, has become quite popular. The 
proof of the convergence property for both standard and accelerated steepest descent 
methods are presented in Chap. 8. 

As the field of optimization advances, addressing greater complexity, treating 
problems with ever more variables (as in Big Data situations), ranging over diverse 
applications. The field responds yo these challenges, developing new algorithms, 
building effective software, and expanding overall theory. An example of a valu- 
able new development is the work on big data problems. Surprisingly, coordinate 
descent, with randomly selected coordinates at each step, is quite effective as ex- 
plained in Chap. 8. As another example some problems are formulated so that the 
unknowns can be split into two sub groups, there are linear constraints and the objec- 
tive function is separable with respect to the two groups of variables. The augmented 
Lagrangian can be computed and it is natural to use an alternating series method. 
We discuss the alternating direction method with multipliers as a dual method in 
Chap. 14. Interestingly, this method is convergent for when the number of partition 
groups is two, but not for finer partitions. 

We wish to thank the many students and researchers who over the years have 
given us comments concerning the book and those who encouraged us to carry out 
this revision. 



Stanford, CA, USA 
Stanford, CA, USA 
January 2015 



D.G. Luenberger 
Y. Ye 
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Chapter 1 

Introduction 



1.1 Optimization 



The concept of optimization is now well rooted as a principle underlying the analysis 
of many complex decision or allocation problems. It offers a certain degree of philo- 
sophical elegance that is hard to dispute, and it often offers an indispensable degree 
of operational simplicity. Using this optimization philosophy, one approaches a 
complex decision problem, involving the selection of values for a number of in- 
terrelated variables, by focusing attention on a single objective designed to quantify 
performance and measure the quality of the decision. This one objective is maxi- 
mized (or minimized, depending on the formulation) subject to the constraints that 
may limit the selection of decision variable values. If a suitable single aspect of a 
problem can be isolated and characterized by an objective, be it profit or loss in 
a business setting, speed or distance in a physical problem, expected return in the 
environment of risky investments, or social welfare in the context of government 
planning, optimization may provide a suitable framework for analysis. 

It is, of course, a rare situation in which it is possible to fully represent all the 
complexities of variable interactions, constraints, and appropriate objectives when 
faced with a complex decision problem. Thus, as with all quantitative techniques 
of analysis, a particular optimization formulation should be regarded only as an 
approximation. Skill in modeling, to capture the essential elements of a problem, 
and good judgment in the interpretation of results are required to obtain meaningful 
conclusions. Optimization, then, should be regarded as a tool of conceptualization 
and analysis rather than as a principle yielding the philosophically correct solution. 

Skill and good judgment, with respect to problem formulation and interpretation 
of results, is enhanced through concrete practical experience and a thorough under- 
standing of relevant theory. Problem formulation itself always involves a tradeoff 
between the conflicting objectives of building a mathematical model sufficiently 
complex to accurately capture the problem description and building a model that is 
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tractable. The expert model builder is facile with both aspects of this tradeoff. One 
aspiring to become such an expert must learn to identify and capture the important 
issues of a problem mainly through example and experience; one must learn to 
distinguish tractable models from nontractable ones through a study of available 
technique and theory and by nurturing the capability to extend existing theory to 
new situations. 

This book is centered around a certain optimization structure — that characteristic 
of linear and nonlinear programming. Examples of situations leading to this struc- 
ture are sprinkled throughout the book, and these examples should help to indicate 
how practical problems can be often fruitfully structured in this form. The book 
mainly, however, is concerned with the development, analysis, and comparison of 
algorithms for solving general subclasses of optimization problems. This is valuable 
not only for the algorithms themselves, which enable one to solve given problems, 
but also because identification of the collection of structures they most effectively 
solve can enhance one’s ability to formulate problems. 



1.2 Types of Problems 

The content of this book is divided into three major parts: Linear Programming, 
Unconstrained Problems, and Constrained Problems. The last two parts together 
comprise the subject of nonlinear programming. 



Linear Programming 

Linear programming is without doubt the most natural mechanism for formulat- 
ing a vast array of problems with modest effort. A linear programming problem 
is characterized, as the name implies, by linear functions of the unknowns; the 
objective is linear in the unknowns, and the constraints are linear equalities or linear 
inequalities in the unknowns. One familiar with other branches of linear mathe- 
matics might suspect, initially, that linear programming formulations are popular 
because the mathematics is nicer, the theory is richer, and the computation simpler 
for linear problems than for nonlinear ones. But, in fact, these are not the primary 
reasons. In terms of mathematical and computational properties, there are much 
broader classes of optimization problems than linear programming problems that 
have elegant and potent theories and for which effective algorithms are available. 
It seems that the popularity of linear programming lies primarily with the formu- 
lation phase of analysis rather than the solution phase — and for good cause. Lor 
one thing, a great number of constraints and objectives that arise in practice are 
indisputably linear. Thus, for example, if one formulates a problem with a budget 
constraint restricting the total amount of money to be allocated among two different 
commodities, the budget constraint takes the form x\ + X 2 < B , where Xj , i = 1,2, 
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is the amount allocated to activity i, and B is the budget. Similarly, if the objective 
is, for example, maximum weight, then it can be expressed as w\X\ + W 2 X 2 , where 
Wj, i = 1,2, is the unit weight of the commodity i. The overall problem would 
be expressed as 



maximize w\X\ + W 2 X 2 
subject to x\ + X 2 < B , 

X\ >0,X2> 0 , 

which is an elementary linear program. The linearity of the budget constraint is 
extremely natural in this case and does not represent simply an approximation to a 
more general functional form. 

Another reason that linear forms for constraints and objectives are so popular in 
problem formulation is that they are often the least difficult to define. Thus, even if 
an objective function is not purely linear by virtue of its inherent definition (as in 
the above example), it is often far easier to define it as being linear than to decide on 
some other functional form and convince others that the more complex form is the 
best possible choice. Linearity, therefore, by virtue of its simplicity, often is selected 
as the easy way out or, when seeking generality, as the only functional form that will 
be equally applicable (or nonapplicable) in a class of similar problems. 

Of course, the theoretical and computational aspects do take on a somewhat spe- 
cial character for linear programming problems — the most significant development 
being the simplex method. This algorithm is developed in Chaps. 2 and 3. More re- 
cent interior point methods are nonlinear in character and these are developed in 
Chap. 5. 



Unconstrained Problems 

It may seem that unconstrained optimization problems are so devoid of structural 
properties as to preclude their applicability as useful models of meaningful problems. 
Quite the contrary is true for two reasons. First, it can be argued, quite convincingly, 
that if the scope of a problem is broadened to the consideration of all relevant de- 
cision variables, there may then be no constraints — or put another way, constraints 
represent artificial delimitations of scope, and when the scope is broadened the con- 
straints vanish. Thus, for example, it may be argued that a budget constraint is not 
characteristic of a meaningful problem formulation; since by borrowing at some 
interest rate it is always possible to obtain additional funds, and hence rather than 
introducing a budget constraint, a term reflecting the cost of funds should be incor- 
porated into the objective. A similar argument applies to constraints describing the 
availability of other resources which at some cost (however great) could be supple- 
mented. 

The second reason that many important problems can be regarded as hav- 
ing no constraints is that constrained problems are sometimes easily converted to 
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unconstrained problems. For instance, the sole effect of equality constraints is sim- 
ply to limit the degrees of freedom, by essentially making some variables functions 
of others. These dependencies can sometimes be explicitly characterized, and a new 
problem having its number of variables equal to the true degree of freedom can be 
determined. As a simple specific example, a constraint of the form x\ + %2 - B can 
be eliminated by substituting X 2 = B - x\ everywhere else that X 2 appears in the 
problem. 

Aside from representing a significant class of practical problems, the study of un- 
constrained problems, of course, provides a stepping stone toward the more general 
case of constrained problems. Many aspects of both theory and algorithms are most 
naturally motivated and verified for the unconstrained case before progressing to the 
constrained case. 



Constrained Problems 

In spite of the arguments given above, many problems met in practice are formulated 
as constrained problems. This is because in most instances a complex problem such 
as, for example, the detailed production policy of a giant corporation, the planning 
of a large government agency, or even the design of a complex device cannot be 
directly treated in its entirety accounting for all possible choices, but instead must be 
decomposed into separate subproblems — each subproblem having constraints that 
are imposed to restrict its scope. Thus, in a planning problem, budget constraints are 
commonly imposed in order to decouple that one problem from a more global one. 
Therefore, one frequently encounters general nonlinear constrained mathematical 
programming problems. 

The general mathematical programming problem can be stated as 
minimize /(x) 

subject to hj(x) = 0, i - 1,2, . . . , m 
gj(x) < 0, j = 1,2, p 
X € S. 

In this formulation, x is an ^-dimensional vector of unknowns, x = (jci, X 2 , . . . , 
x n ), and /, hi, i = 1,2, . . . , m, and gj, j = 1,2, . . . , p, are real- valued functions of 
the variables x \ , jt 2 , . . . , x n . The set S is a subset of ^-dimensional space. The func- 
tion / is the objective function of the problem and the equations, inequalities, and 
set restrictions are constraints. 

Generally, in this book, additional assumptions are introduced in order to make 
the problem smooth in some suitable sense. For example, the functions in the prob- 
lem are usually required to be continuous, or perhaps to have continuous derivatives. 
This ensures that small changes in x lead to small changes in other values associ- 
ated with the problem. Also, the set S is not allowed to be arbitrary but usually is 
required to be a connected region of ^-dimensional space, rather than, for example, 
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a set of distinct isolated points. This ensures that small changes in x can be made. 
Indeed, in a majority of problems treated, the set S is taken to be the entire space; 
there is no set restriction. 

In view of these smoothness assumptions, one might characterize the problems 
treated in this book as continuous variable programming , since we generally discuss 
problems where all variables and function values can be varied continuously. In fact, 
this assumption forms the basis of many of the algorithms discussed, which operate 
essentially by making a series of small movements in the unknown x vector. 



1.3 Size of Problems 

One obvious measure of the complexity of a programming problem is its size, 
measured in terms of the number of unknown variables or the number of constraints. 
As might be expected, the size of problems that can be effectively solved has been 
increasing with advancing computing technology and with advancing theory. Today, 
with present computing capabilities, however, it is reasonable to distinguish three 
classes of problems: small-scale problems having about five or fewer unknowns 
and constraints; intermediate- scale problems having from about five to a hundred 
or a thousand variables; and large-scale problems having perhaps thousands or even 
millions of variables and constraints. This classification is not entirely rigid, but 
it reflects at least roughly not only size but the basic differences in approach that 
accompany different size problems. As a rough rule, small-scale problems can be 
solved by hand or by a small computer. Intermediate- sc ale problems can be solved 
on a personal computer with general purpose mathematical programming codes. 
Large-scale problems require sophisticated codes that exploit special structure and 
usually require large computers. 

Much of the basic theory associated with optimization, particularly in non- 
linear programming, is directed at obtaining necessary and sufficient conditions 
satisfied by a solution point, rather than at questions of computation. This theory 
involves mainly the study of Lagrange multipliers, including the Karush-Kuhn- 
Tucker Theorem and its extensions. It tremendously enhances insight into the phi- 
losophy of constrained optimization and provides satisfactory basic foundations for 
other important disciplines, such as the theory of the firm, consumer economics, 
and optimal control theory. The interpretation of Lagrange multipliers that accom- 
panies this theory is valuable in virtually every optimization setting. As a basis for 
computing numerical solutions to optimization, however, this theory is far from ade- 
quate, since it does not consider the difficulties associated with solving the equations 
resulting from the necessary conditions. 

If it is acknowledged from the outset that a given problem is too large and too 
complex to be efficiently solved by hand (and hence it is acknowledged that a 
computer solution is desirable), then one’s theory should be directed toward devel- 
opment of procedures that exploit the efficiencies of computers. In most cases this 
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leads to the abandonment of the idea of solving the set of necessary conditions in 
favor of the more direct procedure of searching through the space (in an intelligent 
manner) for ever-improving points. 

Today, search techniques can be effectively applied to more or less general non- 
linear programming problems. Problems of great size, large-scale programming 
problems, can be solved if they possess special structural characteristics, especially 
sparsity, that can be exploited by a solution method. Today linear programming soft- 
ware packages are capable of automatically identifying sparse structure within the 
input data and taking advantage of this sparsity in numerical computation. It is now 
not uncommon to solve linear programs of up to a million variables and constraints, 
as long as the structure is sparse. Problem-dependent methods, where the structure 
is not automatically identified, are largely directed to transportation and network 
flow problems as discussed in the book. 

This book focuses on the aspects of general theory that are most fruitful for 
computation in the widest class of problems. While necessary and sufficient con- 
ditions are examined and their application to small-scale problems is illustrated, our 
primary interest in such conditions is in their role as the core of a broader theory 
applicable to the solution of larger problems. At the other extreme, although some 
instances of structure exploitation are discussed, we focus primarily on the general 
continuous variable programming problem rather than on special techniques for spe- 
cial structures. 



1.4 Iterative Algorithms and Convergence 

The most important characteristic of a high-speed computer is its ability to per- 
form repetitive operations efficiently, and in order to exploit this basic character- 
istic, most algorithms designed to solve large optimization problems are iterative 
in nature. Typically, in seeking a vector that solves the programming problem, an 
initial vector xo is selected and the algorithm generates an improved vector xi . The 
process is repeated and a still better solution X2 is found. Continuing in this fashion, 
a sequence of ever-improving points x 0 , Xi, . . . , x&, . . ., is found that approaches a 
solution point x* . For linear programming problems solved by the simplex method, 
the generated sequence is of finite length, reaching the solution point exactly after a 
finite (although initially unspecified) number of steps. For nonlinear programming 
problems or interior-point methods, the sequence generally does not ever exactly 
reach the solution point, but converges toward it. In operation, the process is termi- 
nated when a point sufficiently close to the solution point, for practical purposes, is 
obtained. 

The theory of iterative algorithms can be divided into three (somewhat overlap- 
ping) aspects. The first is concerned with the creation of the algorithms themselves. 
Algorithms are not conceived arbitrarily, but are based on a creative examination 
of the programming problem, its inherent structure, and the efficiencies of digital 
computers. The second aspect is the verification that a given algorithm will in fact 
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generate a sequence that converges to a solution point. This aspect is referred to as 
global convergence analysis , since it addresses the important question of whether 
the algorithm, when initiated far from the solution point, will eventually converge 
to it. The third aspect is referred to as local convergence analysis or complexity 
analysis and is concerned with the rate at which the generated sequence of points 
converges to the solution. One cannot regard a problem as solved simply because 
an algorithm is known which will converge to the solution, since it may require 
an exorbitant amount of time to reduce the error to an acceptable tolerance. It is 
essential when prescribing algorithms that some estimate of the time required be 
available. It is the convergence-rate aspect of the theory that allows some quantita- 
tive evaluation and comparison of different algorithms, and at least crudely, assigns 
a measure of tractability to a problem, as discussed in Sect. 1.1. 

A modern-day technical version of Confucius’ most famous saying, and one 
which represents an underlying philosophy of this book, might be, “One good theory 
is worth a thousand computer runs.” Thus, the convergence properties of an itera- 
tive algorithm can be estimated with confidence either by performing numerous 
computer experiments on different problems or by a simple well-directed theoreti- 
cal analysis. A simple theory, of course, provides invaluable insight as well as the 
desired estimate. 

For linear programming using the simplex method, solid theoretical statements 
on the speed of convergence were elusive, because the method actually converges to 
an exact solution in a finite number of steps. The question is how many steps might 
be required. This question was finally resolved when it was shown that it was possi- 
ble for the number of steps to be exponential in the size of the program. The situa- 
tion is different for interior point algorithms, which essentially treat the problem by 
introducing nonlinear terms, and which therefore do not generally obtain a solution 
in a finite number of steps but instead converge toward a solution. 

For nonlinear programs, including interior point methods applied to linear pro- 
grams, it is meaningful to consider the speed of convergence. There are many 
different classes of nonlinear programming algorithms, each with its own conver- 
gence characteristics. However, in many cases the convergence properties can be 
deduced analytically by fairly simple means, and this analysis is substantiated by 
computational experience. Presentation of convergence analysis, which seems to be 
the natural focal point of a theory directed at obtaining specific answers, is a unique 
feature of this book. 

There are in fact two aspects of convergence-rate theory. The first is generally 
known as complexity analysis and focuses on how fast the method converges over- 
all, distinguishing between polynomial-time algorithms and non-polynomial-time 
algorithms. The second aspect provides more detailed analysis of how fast the 
method converges in the final stages, and can provide comparisons between dif- 
ferent algorithms. Both of these are treated in this book. 

The convergence-rate theory presented has two somewhat surprising but definitely 
pleasing aspects. First, the theory is, for the most part, extremely simple in nature. 
Although initially one might fear that a theory aimed at predicting the speed of 
convergence of a complex algorithm might itself be doubly complex, in fact the 
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associated convergence analysis often turns out to be exceedingly elementary, re- 
quiring only a line or two of calculation. Second, a large class of seemingly distinct 
algorithms turns out to have a common convergence rate. Indeed, as emphasized 
in the later chapters of the book, there is a canonical rate associated with a given 
programming problem that seems to govern the speed of convergence of many algo- 
rithms when applied to that problem. It is this fact that underlies the potency of the 
theory, allowing definitive comparisons among algorithms to be made even without 
detailed knowledge of the problems to which they will be applied. Together these 
two properties, simplicity and potency, assure convergence analysis a permanent 
position of major importance in mathematical programming theory. 




Part I 

Linear Programming 




Chapter 2 

Basic Properties of Linear Programs 



2.1 Introduction 



A linear program (LP) is an optimization problem in which the objective function 
is linear in the unknowns and the constraints consist of linear equalities and linear 
inequalities. The exact form of these constraints may differ from one problem to an- 
other, but as shown below, any linear program can be transformed into the following 
standard form. 



minimize c\X\ + C2X2 + . . . + c n x n 
subject to a\\X\ + <212*2 + • • • + a\ n x n = b\ 
<221*1 + ^22*2 + . . . + a 2 n x n = b 2 

^ml*l ^m2*2 + ' ' ' + ^mn*« b m 
and *1 ^ 0, *2 ^ 0, ...,*„> 0, 



( 2 . 1 ) 



where the b{ s, c,-’s and < 2 */s are fixed real constants, and the x ? s are real numbers to 
be determined. We always assume that each equation has been multiplied by minus 
unity, if necessary, so that each bi > 0. 

In more compact vector notation, 1 this standard problem becomes 

minimize c r x 

subject to Ax = b and x ^ 0. (2.2) 

Here x is an ^-dimensional column vector, c T is an ^-dimensional row vector, A is 
an m x n matrix, and b is an m-dimensional column vector. The vector inequality 
x ^ 0 means that each component of x is nonnegative. 



1 See Appendix A for a description of the vector notation used throughout this book. 
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2 Basic Properties of Linear Programs 



Before giving some examples of areas in which linear programming problems 
arise naturally, we indicate how various other forms of linear programs can be con- 
verted to the standard form. 

Example 1 (Slack Variables). Consider the problem 

minimize c\X\ + C2X2 + • • • + c n x n 
subject to <211X1 + <212X2 + • • • + a\ n x n < b\ 

<221X1 + <222*2 + • • • + a2 n Xn < ^2 



0ml*l ^m2*2 + ' ' ' + a mn X n ^ b m 
and xi > 0, X2 > 0, . . . , x n ^ 0, 

In this case the constraint set is determined entirely by linear inequalities. 
The problem may be alternatively expressed as 

minimize c\X\ + <22*2 + • • • + c n x n 

subject to <211X1 + <212X2 + • • • + a\ n x n + y\ -b\ 

<221X1 + <222*2 + • • • + a 2n Xn +y 2 = b 2 

^ml*l "f ^m2*2 + ' * * + a mn x n + y m — b m 
and xi > 0, X2 ^ 0, . . . , x„ > 0 , 

and yi > 0 , y 2 > 0 , . . . , y m > 0. 

The new positive variables y* introduced to convert the inequalities to equalities 
are called slack variables (or more loosely, slacks). By considering the problem 
as one having n + m unknowns xi, X2, . . . , x n , y\, y2, . . . , y m , the problem takes 
the standard form. The m x (n + m) matrix that now describes the linear equality 
constraints is of the special form [A, I] (that is, its columns can be partitioned into 
two sets; the first n columns make up the original A matrix and the last m columns 
make up an m x m identity matrix). 

Example 2 (Surplus Variables). If the linear inequalities of Example 1 are reversed 
so that a typical inequality is 



0*1*1 + 0*2*2 + • • • + a in x n ^ bi , 



it is clear that this is equivalent to 



0*1*1 + 0*2*2 + • • • + a in x n - y t = bi 

with y t ^ 0. Variables, such as y t , adjoined in this fashion to convert a “greater than 
or equal to” inequality to equality are called surplus variables. 

It should be clear that by suitably multiplying by minus unity, and adjoining slack 
and surplus variables, any set of linear inequalities can be converted to standard form 
if the unknown variables are restricted to be nonnegative. 



2.1 Introduction 



13 



Example 3 ( Free Variables — First Method). If a linear program is given in standard 
form except that one or more of the unknown variables is not required to be non- 
negative, the problem can be transformed to standard form by either of two simple 
techniques. 

To describe the first technique, suppose in (2.1), for example, that the restriction 
x\ > 0 is not present and hence x\ is free to take on either positive or negative 
values. We then write 

x\ — u\ — Vi, (2.3) 

where we require u\ > 0 and v\ > 0. If we substitute u\ - v\ for x\ everywhere in 
(2.1), the linearity of the constraints is preserved and all variables are now required 
to be nonnegative. The problem is then expressed in terms of the n + 1 variables 

Ml, vi, x 2 , x 3 , ..., x n . 

There is obviously a certain degree of redundancy introduced by this technique, 
however, since a constant added to u\ and vi does not change x\ (that is, the rep- 
resentation of a given value x\ is not unique). Nevertheless, this does not hinder 
the simplex method of solution. 

Example 4 ( Free Variables — Second Method). A second approach for converting to 
standard form when x\ is unconstrained in sign is to eliminate x\ together with one 
of the constraint equations. Take any one of the m equations in (2.1) which has a 
nonzero coefficient for x\. Say, for example, 



a t \ X\ + ai 2 x 2 + • • • + ai n x n — bf, ( 2 . 4 ) 

where a t \ ^ 0. Then x\ can be expressed as a linear combination of the other vari- 
ables plus a constant. If this expression is substituted for x\ everywhere in ( 2 . 1 ), 
we are led to a new problem of exactly the same form but expressed in terms of 
the variables x 2 , x 3 , . . . , x n only. Furthermore, the ith equation, used to determine 
x\ , is now identically zero and it too can be eliminated. This substitution scheme 
is valid since any combination of nonnegative variables x 2 , x 3 , . . . , x n leads to a 
feasible x\ from ( 2 . 4 ), since the sign of x\ is unrestricted. As a result of this sim- 
plification, we obtain a standard linear program having n - 1 variables and m - 1 
constraint equations. The value of the variable x\ can be determined after solution 
through ( 2 . 4 ). 

Example 5 ( Specific Case). As a specific instance of the above technique consider 
the problem 



minimize x\ + 3 x 2 + 4x 3 
subject to x\ + 2x 2 x 3 = 5 
2x\ + 3x2 + X3 = 6 
X2 > 0, x 3 ^ 0. 



Since x\ is free, we solve for it from the first constraint, obtaining 
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X\ = 5 - 2x 2 “ X3 . (2.5) 

Substituting this into the objective and the second constraint, we obtain the equiva- 
lent problem (subtracting five from the objective) 

minimize X 2 + 3x3 
subject to X2 + X3 = 4 

x 2 >0, x 3 > 0, 



which is a problem in standard form. After the smaller problem is solved (the answer 
is X2 = 4, X3 = 0) the value for xi(xi = -3) can be found from (2.5). 



2.2 Examples of Linear Programming Problems 



Linear programming has long proved its merit as a significant model of numerous 
allocation problems and economic phenomena. The continuously expanding litera- 
ture of applications repeatedly demonstrates the importance of linear programming 
as a general framework for problem formulation. In this section we present some 
classic examples of situations that have natural formulations. 

Example 1 ( The Diet Problem). How can we determine the most economical diet 
that satisfies the basic minimum nutritional requirements for good health? Such a 
problem might, for example, be faced by the dietitian of a large army. We assume 
that there are available at the market n different foods and that the jth food sells at a 
price cj per unit. In addition there are m basic nutritional ingredients and, to achieve 
a balanced diet, each individual must receive at least hi units of the ith nutrient per 
day. Finally, we assume that each unit of food j contains a t j units of the ith nutrient. 

If we denote by xj the number of units of food j in the diet, the problem then is 
to select the x/s to minimize the total cost 

C\X\ + c 2 x 2 + • • • + c n x n 
subject to the nutritional constraints 

at 1X1 + clqx 2 + 1- at n x n > bi, i = 1, . . . , m, 

and the nonnegativity constraints 

xi ^ 0, X2 > 0, . . . , x n ^ 0 



on the food quantities. 

This problem can be converted to standard form by subtracting a nonnegative 
surplus variable from the left side of each of the m linear inequalities. The diet 
problem is discussed further in Chap. 4. 

Example 2 ( Manufacturing Problem ). Suppose we own a facility that is capable of 
manufacturing n different products, each of which may require various amounts 
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of m different resources. Each product can be produced at any level xj > 0, 
j = 1, 2, . . . , n, and each unit of the jth product can sell for pj dollars and needs 
ciij units of the ith resource, i = 1, 2, . . . , m. Assuming linearity of the production 
facility, if we are given a set of m numbers b\, b 2 , • . . , b m describing the available 
quantities of the m resources, and we wish to manufacture products at maximum 
revenue, ours decision problem is a linear program to maximize 

PlXi + p 2 X 2 + • • • + p n X n 
subject to the resource constraints 



CLi\X\ + di 2 X 2 + h dinXn < bu i = 1, . . . , m 

and the nonnegativity constraints on all production variables. 

Example 3 (The Transportation Problem). Quantities a\ , a 2 , . . . , a m , respectively, 
of a certain product are to be shipped from each of m locations and received in 
amounts b \ , b 2 , . . . , b n , respectively, at each of n destinations. Associated with the 
shipping of a unit of product from origin i to destination j is a shipping cost Qy. It is 
desired to determine the amounts x t j to be shipped between each origin-destination 
pair i = 1,2, . . . , m; j = 1,2, . . . , n\ so as to satisfy the shipping requirements and 
minimize the total cost of transportation. 

To formulate this problem as a linear programming problem, we set up the array 
shown below: 




The ith row in this array defines the variables associated with the ith origin, while 
the jth column in this array defines the variables associated with the jth destina- 
tion. The problem is to place nonnegative variables Xij in this array so that the sum 
across the ith row is aj, the sum down the jth column is bj , and the weighted sum 
YI) = i YIi=\ c ij x ij > representing the transportation cost, is minimized. 

Thus, we have the linear programming problem: 



minimize ^ CjjXjj 
ij 

n 

subject to ^ x^ = aj for i = 1, 2, . . . , m (2.6) 

j=t 



m 




i= 1 



for / = 1,2, . . . , n 



(2.7) 
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Xij > 0 for i = 1,2, . . . , m; j = 1,2, . . . , ft. 



In order that the constraints (2.6) and (2.7) be consistent, we must, of course, 
assume that a i = i bj which corresponds to assuming that the total amount 
shipped is equal to the total amount received. 

The transportation problem is now clearly seen to be a linear programming prob- 
lem in mn variables. The equations (2.6) and (2.7) can be combined and expressed 
in matrix form in the usual manner and this results in an (m + n) x (mn) coefficient 
matrix consisting of zeros and ones only. 



Fig. 2.1 A network with capacities 



Example 4 (The Maximal Flow Problem). Consider a capacitated network (see 
Fig. 2.1, and Appendix D) in which two special nodes, called the source and the 
sink , are distinguished. Say they are nodes 1 and m, respectively. All other nodes 
must satisfy the strict conservation requirement; that is, the net flow into these nodes 
must be zero. However, the source may have a net outflow and the sink a net inflow. 
The outflow / of the source will equal the inflow of the sink as a consequence of 
the conservation at all other nodes. A set of arc flows satisfying these conditions 
is said to be a flow in the network of value /. The maximal flow problem is that 
of determining the maximal flow that can be established in such a network. When 
written out, it takes the form 




minimize / 



n 



n 




n 



n 





( 2 . 8 ) 



n 



n 




7=1 7=1 

0 < Xij < kij , forall i, j , 



where kij = 0 for those no-arc pairs (i, j). 
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Example 5 (A Warehousing Problem). Consider the problem of operating a ware- 
house, by buying and selling the stock of a certain commodity, in order to maximize 
profit over a certain length of time. The warehouse has a fixed capacity C, and there 
is a cost r per unit for holding stock for one period. The price, pu of the commod- 
ity is known to fluctuate over a number of time periods — say months, indexed by 
i. In any period the same price holds for both purchase or sale. The warehouse is 
originally empty and is required to be empty at the end of the last period. 

To formulate this problem, variables are introduced for each time period. In par- 
ticular, let Xi denote the level of stock in the warehouse at the beginning of period i. 
Let Ui denote the amount bought during period i, and let s t denote the amount sold 
during period i. If there are n periods, the problem is 



maximize 
subject to 



n 

ZiPiisi - Ui) - rxi) 
i= 1 

X i+ \ = Xi + Ui - Si 

0 — “I - Uyi S n 

Xi + Zi = c 

x\ = 0 , Xi > 0 , Ui > 0 , 



i = 1,2, ..., n- 1 
i = 2, . . . , n 

Si > 0 , Zi > 0 , 



where Zi is a slack variable. If the constraints are written out explicitly for the case 
n- 3, they take the form 



-U\ + Si 


+X 2 




=0 




—Xi - U2 + S2 


+T3 


=0 




X2 +Z2 




=c 






-X 3 ~ W3 + S3 


=0 






X 3 +Z 3 


=c 



Note that the coefficient matrix can be partitioned into blocks corresponding to 
the variables of the different time periods. The only blocks that have nonzero entries 
are the diagonal ones and the ones immediately above the diagonal. This structure 
is typical of problems involving time. 

Example 6 (Linear Classifier and Support Vector Machine). Suppose several 
d-dimensional data points are classified into two distinct classes. For example, two- 
dimensional data points may be grade averages in science and humanities for differ- 
ent students. We also know the academic major of each student, as being in science 
or humanities, which serves as the classification. In general we have vectors a i e E d 
for i = 1,2, . . . , n\ and vectors by e E d for j - 1,2, . . . , /i 2 - We wish to find 
a hyperplane that separates the a*’s from the b/s. Mathematically we wish to find 
y e E d and a number fi such that 

afy+£ ^ 1 for all i 
bjy + ft < -1 for all j, 

where {x : x T y +>0 = 0} is the desired hyperplane, and the separation is defined by 
the +1 and -1. This is a linear program. See Fig. 2.2. 
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Example 7 ( Combinatorial Auction). Suppose there are m mutually exclusive po- 
tential states and only one of them will be true at maturity. For example, the states 
may correspond to the winning horse in a race of m horses, or the value of a stock 
index, falling within m intervals. An auction organizer who establishes a parimutuel 
auction is prepared to issue contracts specifying subsets of the m possibilities that 
pay $1 if the final state is one of those designated by the contract, and zero oth- 
erwise. There are n participants who may place orders with the organizer for the 
purchase of such contracts. An order by the 7th participant consists of an m - vector 
slj = ( a\j , a2j , . . . , a m j) T where each component is either 0 or 1, a one indicating a 
desire to be paid if the corresponding state occurs. 




Fig. 2.2 Support vector for data classification 



Accompanying the order is a number 71 j which is the price limit the participant 
is willing to pay for one unit of the order. Finally, the participant also declares the 
maximum number qj of units he or she is willing to accept under these terms. 

The auction organizer, after receiving these various orders, must decide how 
many contracts to fill. Let Xj be the (real) number of units awarded to the jth or- 
der. Then the jth participant will pay 7 XjXj. The total amount paid by all participants 
is 7r r x, where x is the vector of x/s and n is the vector of prices. 

If the outcome is the ith state, the auction organizer must pay out a total of 
Y!]=\ a ij x j - (Ax) 7 -. The organizer would like to maximize profit in the worst possi- 
ble case, and does this by solving the problem 

maximize n T x- max/ (Ax)/ 
subject to 0 < x < q. 




2.3 Basic Solutions 
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This problem can be expressed alternatively as selecting x and scalar s to 

maximize n T x- s 
subject to Ax - Is < 0 
0 < x < q 

where 1 is the vector of all l’s. Notice that the profit will always be nonnegative, 
since x = 0 is feasible. 



2.3 Basic Solutions 

Consider the system of equalities 



Ax = b, (2.9) 

where x is an ^-vector, b an m-vector, and A is an m x n matrix. Suppose that from 
the n columns of A we select a set of m linearly independent columns (such a set 
exists if the rank of A is m). For notational simplicity assume that we select the first 
m columns of A and denote the m x m matrix determined by these columns by B. 
The matrix B is then nonsingular and we may uniquely solve the equation. 

Bx b = b (2.10) 

for the m-vector x B . By putting x = (x B , 0) (that is, setting the first m components 
of x equal to those of x B and the remaining components equal to zero), we obtain a 
solution to Ax = b. This leads to the following definition. 

Definition. Given the set of m simultaneous linear equations in n unknowns (2.9), let B be 
any nonsingular mxm submatrix made up of columns of A. Then, if all n—m components of 
x not associated with columns of B are set equal to zero, the solution to the resulting set of 
equations is said to be a basic solution to (2.9) with respect to the basis B. The components 
of x associated with columns of B are called basic variables. 

In the above definition we refer to B as a basis, since B consists of m linearly 

independent columns that can be regarded as a basis for the space E m . The basic 

solution corresponds to an expression for the vector b as a linear combination of 
these basis vectors. This interpretation is discussed further in the next section. 

In general, of course, Eq. (2.9) may have no basic solutions. However, we may 
avoid trivialities and difficulties of a nonessential nature by making certain elemen- 
tary assumptions regarding the structure of the matrix A. First, we usually assume 
that n > m, that is, the number of variables xj exceeds the number of equality con- 
straints. Second, we usually assume that the rows of A are linearly independent, cor- 
responding to linear independence of the m equations. A linear dependency among 
the rows of A would lead either to contradictory constraints and hence no solutions 
to (2.9), or to a redundancy that could be eliminated. Formally, we explicitly make 
the following assumption in our development, unless noted otherwise. 
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Full Rank Assumption. The mxn matrix A has m < n, and the m rows of A are linearly 
independent. 

Under the above assumption, the system (2.9) will always have a solution and, in 
fact, it will always have at least one basic solution. 

The basic variables in a basic solution are not necessarily all nonzero. This is 
noted by the following definition. 

Definition. If one or more of the basic variables in a basic solution has value zero, that 
solution is said to be a degenerate basic solution. 

We note that in a nondegenerate basic solution the basic variables, and hence the 
basis B, can be immediately identified from the positive components of the solution. 
There is ambiguity associated with a degenerate basic solution, however, since the 
zero-valued basic and some of nonbasic variables can be interchanged. 

So far in the discussion of basic solutions we have treated only the equality con- 
straint (2.9) and have made no reference to positivity constraints on the variables. 
Similar definitions apply when these constraints are also considered. Thus, consider 
now the system of constraints 



Ax = b, x > 0, (2.11) 

which represent the constraints of a linear program in standard form. 

Definition. A vector x satisfying (2.1 1) is said to be feasible for these constraints. A feasi- 
ble solution to the constraints (2.1 1) that is also basic is said to be a basic feasible solution ; 
if this solution is also a degenerate basic solution, it is called a degenerate basic feasible 
solution. 



2.4 The Fundamental Theorem of Linear Programming 

In this section, through the fundamental theorem of linear programming, we estab- 
lish the primary importance of basic feasible solutions in solving linear programs. 
The method of proof of the theorem is in many respects as important as the result 
itself, since it represents the beginning of the development of the simplex method. 
The theorem (due to Caratheodory) itself shows that it is necessary only to con- 
sider basic feasible solutions when seeking an optimal solution to a linear program 
because the optimal value is always achieved at such a solution. 

Corresponding to a linear program in standard form 

minimize c T x 

subject to Ax = b, x > 0 (2.12) 

a feasible solution to the constraints that achieves the minimum value of the objec- 
tive function subject to those constraints is said to be an optimal feasible solution. 
If this solution is basic, it is an optimal basic feasible solution. 



2.4 The Fundamental Theorem of Linear Programming 
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Fundamental Theorem of Linear Programming. Given a linear program in standard form 
(2.12) where A is an mx n matrix of rank m, 

i) if there is a feasible solution, there is a basic feasible solution ; 

ii) if there is an optimal feasible solution, there is an optimal basic feasible solution. 

Proof of (i). Denote the columns of A by ai, a 2 , . . . , a n . Suppose x = (x\, x 2 , . . . , 
x n ) is a feasible solution. Then, in terms of the columns of A, this solution satisfies: 

X\SL\ + X 2 2L 2 + • • • + X n 2L n — b. 

Assume that exactly p of the variables x x are greater than zero, and for convenience, 
that they are the first p variables. Thus 



x\sli + x 2 a 2 + • • • + x p 2 ip = b. (2.13) 

There are now two cases, corresponding as to whether the set ai, a 2 , . . . , a p is 
linearly independent or linearly dependent. 

Case 1: Assume ai, a 2 , . . . , a^ are linearly independent. Then clearly, p < m. 
If p = m, the solution is basic and the proof is complete. If p < m, then, since A 
has rank m, m — p vectors can be found from the remaining n - p vectors so that 
the resulting set of m vectors is linearly independent. (See Exercise 12.) Assign- 
ing the value zero to the corresponding m - p variables yields a (degenerate) basic 
feasible solution. 

Case 2: Assume ai, a 2 , . . . , a p are linearly dependent. Then there is a non- 
trivial linear combination of these vectors that is zero. Thus there are constants 
yi, y 2 , . . . , y p , at least one of which can be assumed to be positive, such that 

Ti a i + J2 a 2 + • • • + y P & P = 0. (2.14) 

Multiplying this equation by a scalar s and subtracting it from (2.13), we obtain 

C*i - ey i)ai + (x 2 - sy 2 )^2 + • • • + (x p - sy p ) tx p = b. (2.15) 

This equation holds for every s , and for each s the components Xj-syj correspond to 
a solution of the linear equalities — although they may violate Xi - syi > 0. Denoting 
y - Oh, y 2 , . . . , y p , 0 , 0 , . . . , 0 ), we see that for any s 



x-sy (2.16) 

is a solution to the equalities. For s = 0, this reduces to the original feasible solution. 
As s is increased from zero, the various components increase, decrease, or remain 
constant, depending upon whether the corresponding y t is negative, positive, or zero. 
Since we assume at least one yi is positive, at least one component will decrease as s 
is increased. We increase s to the first point where one or more components become 
zero. Specifically, we set 



s = min {xi/yt : y* > 0}. 



22 



2 Basic Properties of Linear Programs 



For this value of s the solution given by (2.16) is feasible and has at most p — 1 
positive variables. Repeating this process if necessary, we can eliminate positive 
variables until we have a feasible solution with corresponding columns that are lin- 
early independent. At that point Case 1 applies. I 

Proof of (ii). Let x = (x\, X 2 , . . . , x n ) be an optimal feasible solution and, as in 
the proof of (i) above, suppose there are exactly p positive variables x\, X 2 , . . . , x p . 
Again there are two cases; and Case 1, corresponding to linear independence, is 
exactly the same as before. 

Case 2 also goes exactly the same as before, but it must be shown that for any 
s the solution (2.16) is optimal. To show this, note that the value of the solution 
x - sy is 

c T x-sc T y. (2.17) 

For £ sufficiently small in magnitude, x - £y is a feasible solution for positive or 
negative values of £. Thus we conclude that c T y = 0. For, if c T y ^ 0, an £ of small 
magnitude and proper sign could be determined so as to render (2.17) smaller than 
c r x while maintaining feasibility. This would violate the assumption of optimality 
of x and hence we must have c T y = 0. 

Having established that the new feasible solution with fewer positive components 
is also optimal, the remainder of the proof may be completed exactly as in part (i). 

I 



This theorem reduces the task of solving a linear program to that of searching 
over basic feasible solutions. Since for a problem having n variables and m con- 
straints there are at most 

n\ 

m ! (n - m)\ 

basic solutions (corresponding to the number of ways of selecting m of n columns), 
there are only a finite number of possibilities. Thus the fundamental theorem yields 
an obvious, but terribly inefficient, finite search technique. By expanding upon the 
technique of proof as well as the statement of the fundamental theorem, the efficient 
simplex procedure is derived. 

It should be noted that the proof of the fundamental theorem given above is of 
a simple algebraic character. In the next section the geometric interpretation of this 
theorem is explored in terms of the general theory of convex sets. Although the 
geometric interpretation is aesthetically pleasing and theoretically important, the 
reader should bear in mind, lest one be diverted by the somewhat more advanced 
arguments employed, the underlying elementary level of the fundamental theorem. 




2.5 Relations to Convexity 
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Our development to this point, including the above proof of the fundamental theo- 
rem, has been based only on elementary properties of systems of linear equations. 
These results, however, have interesting interpretations in terms of the theory of 
convex sets that can lead not only to an alternative derivation of the fundamen- 
tal theorem, but also to a clearer geometric understanding of the result. The main 
link between the algebraic and geometric theories is the formal relation between 
basic feasible solutions of linear inequalities in standard form and extreme points 
of poly topes. We establish this correspondence as follows. The reader is referred to 
Appendix B for a more complete summary of concepts related to convexity, but the 
definition of an extreme point is stated here. 

Definition. A point x in a convex set C is said to be an extreme point of C if there are no 
two distinct points xi and x 2 in C such that x = ax i + (1 - a)x 2 for some a, 0 < a < 1. 

An extreme point is thus a point that does not lie strictly within a line segment 
connecting two other points of the set. The extreme points of a triangle, for example, 
are its three vertices. 

Theorem ( Equivalence of Extreme Points and Basic Solutions ). Let Abe anmxn matrix 
of rank m and b an m-vector. Let K be the convex polytope consisting of all n-vectors x 
satisfying 



Ax = b, x > 0. (2.18) 

A vector x is an extreme point of K if and only ifx is a basic feasible solution to (2.18). 

Proof. Suppose first that x = (x\, X 2 , . x m , 0,0, ..., 0) is a basic feasible 
solution to (2.18). Then 



X\<1\ + X2&2 "b • • ' “T X m 4& m — b, 

where ai, a 2 , . . . , a m , the first m columns of A, are linearly independent. Suppose 
that x could be expressed as a convex combination of two other points in K\ say, 
x = oy + (l -a)z, 0<cr<l,yAz. Since all components of x, y, z are nonnegative 
and since 0 < a < 1 , it follows immediately that the last n-m components of y and 
z are zero. Thus, in particular, we have 



yiai + y 2 a 2 + ■ ■ ■ + y m a m = b 

and 

ziai + z 2 a 2 + ■■■+Zm a m = b. 

Since the vectors ai, a2, . . . , a m are linearly independent, however, it follows that 
x = y = z and hence x is an extreme point of K. 

Conversely, assume that x is an extreme point of K. Let us assume that the 
nonzero components of x are the first k components. Then 



X\SL\ + X2&2 + • • • + XkSLk = b, 
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with Xi > 0, i = 1, 2, . . . , k. To show that x is a basic feasible solution it must be 
shown that the vectors ai, a 2 , . . . , are linearly independent. We do this by con- 
tradiction. Suppose ai, a 2 , . . . , are linearly dependent. Then there is a nontrivial 
linear combination that is zero: 



yisn + y 2 a 2 + • • • + y k a k = 0. 



Define the ^-vector y = (yi, y 2 , • • • , yk, 0, 0, . . . , 0). Since Xi >0, 1 < i < k, it is 
possible to select s such that 



x + sy > 0, x - sy ^ 0. 

We then have x = ^(x+£y) + ^(x-sy) which expresses x as a convex combination of 
two distinct vectors in K. This cannot occur, since x is an extreme point of K. Thus 
ai , a 2 , . . . , are linearly independent and x is a basic feasible solution. (Although 
if k <m, it is a degenerate basic feasible solution.) I 

This correspondence between extreme points and basic feasible solutions enables 
us to prove certain geometric properties of the convex polytope K defining the con- 
straint set of a linear programming problem. 

Corollary 1. If the convex set K corresponding to (2.18) is nonempty, it has at least one 
extreme point. 

Proof. This follows from the first part of the Fundamental Theorem and the Equiv- 
alence Theorem above. I 

Corollary 2. If there is a finite optimal solution to a linear programming problem, there is 
a finite optimal solution which is an extreme point of the constraint set. 

Corollary 3. The constraint set K corresponding to (2.18) possesses at most a finite number 
of extreme points. 

Proof. There are obviously only a finite number of basic solutions obtained by 
selecting m basis vectors from the n columns of A. The extreme points of K are 
a subset of these basic solutions. I 

Finally, we come to the special case which occurs most frequently in practice and 
which in some sense is characteristic of well-formulated linear programs — the case 
where the constraint set K is nonempty and bounded. In this case we combine the 
results of the Equivalence Theorem and Corollary 3 above to obtain the following 
corollary. 

Corollary 4. If the convex poly tope K corresponding to (2.18) is bounded, then K is a con- 
vex polyhedron, that is, K consists of points that are convex combinations of a finite number 
of points. 



Some of these results are illustrated by the following examples: 



2.5 Relations to Convexity 
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Example 1. Consider the constraint set in E 3 defined by 

X\ + X 2 + X 3 — 1 
X\ > 0, X2 > 0, X 3 ^ 0. 

This set is illustrated in Fig. 2.3. It has three extreme points, corresponding to the 
three basic solutions to x\ + X 2 + *3 = 1 . 

Example 2. Consider the constraint set in E 3 defined by 

X\ + X 2 + X 3 — 1 
2xi + 3^2 = 1 

x\ ^ 0, X2> 0 , X 3 ^ 0 . 



x i 



*1 




x 2 

Fig. 2.3 Feasible set for Example 1 



This set is illustrated in Fig. 2.4. It has two extreme points, corresponding to the 
two basic feasible solutions. Note that the system of equations itself has three basic 
solutions, (2, -1, 0), (1/2, 0, 1/2 ), (0, 1/3, 2/3), the first of which is not feasible. 

Example 3. Consider the constraint set in E 2 defined in terms of the inequalities 

8 

*1 + 3X2 < 4 
X\ + X2 < 2 
2xi < 3 

X\ > 0, X2 > 0. 
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This set is illustrated in Fig. 2.5. We see by inspection that this set has five ex- 
treme points. In order to compare this example with our general results we must 
introduce slack variables to yield the equivalent set in E 5 : 

8 

X\ + —X2 + V3 =4 

X\ + X2 + X4 =2 
2x\ + V 5 = 3 

x\ > 0 , X 2 > 0 , X 3 ^ 0 , X 4 ^ 0 , X 5 ^ 0 . 



A basic solution for this system is obtained by setting any two variables to zero and 
solving for the remaining three. As indicated in Fig. 2.5, each edge of the figure 
corresponds to one variable being zero, and the extreme points are the points where 
two variables are zero. 



*3 




Fig. 2.4 Feasible set for Example 2 



The last example illustrates that even when not expressed in standard form the 
extreme points of the set defined by the constraints of a linear program correspond to 
the possible solution points. This can be illustrated more directly by including the 
objective function in the figure as well. Suppose, for example, that in Example 3 
the objective function to be minimized is -2xi - X 2 . The set of points satisfying 
—2x\ — X 2 = z for fixed z is a line. As z varies, different parallel lines are obtained 
as shown in Fig. 2.6. The optimal value of the linear program is the smallest value 
of z for which the corresponding line has a point in common with the feasible set. 
It should be reasonably clear, at least in two dimensions, that the points of solution 
will always include an extreme point. In the figure this occurs at the point (3/2, 1/2) 
with z = -7/2. 



2.6 Exercises 
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2.6 Exercises 

1 . Convert the following problems to standard form: 

(a) minimize x + 2y + 3z 
subject to2<x + y<3 

4 < x + z < 5 
x > 0, y > 0, z > 0. 

(b) minimize x + y + z 
subject to x + 2y + 3z = 10 

x > 1, y > 2, z > 1. 



X 2 




Fig. 2.5 Feasible set for Example 3 



2. A manufacturer wishes to produce an alloy that is, by weight, 30 % metal A and 
70 % metal B. Five alloys are available at various prices as indicated below: 



| Alloy 


|1 2 3 4 5 


%A 


10 25 50 75 95 


%B 


90 75 50 25 5 


Price/lb $5$4$3$2$ 1.50| 



The desired alloy will be produced by combining some of the other alloys. The 
manufacturer wishes to find the amounts of the various alloys needed and to 
determine the least expensive combination. Formulate this problem as a linear 
program. 
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3. An oil refinery has two sources of crude oil: a light crude that costs $35/barrel 
and a heavy crude that costs $30/barrel. The refinery produces gasoline, heating 
oil, and jet fuel from crude in the amounts per barrel indicated in the following 
table: 



Gasoline Heating oil Jet fuel 



Light crude 


0.3 


0.2 


0.3 


Heavy crude 


0.3 


0.4 


0.2 



The refinery has contracted to supply 900,000 barrels of gasoline, 800,000 bar- 
rels of heating oil, and 500,000 barrels of jet fuel. The refinery wishes to find 
the amounts of light and heavy crude to purchase so as to be able to meet its 
obligations at minimum cost. Formulate this problem as a linear program. 

4. A small firm specializes in making five types of spare automobile parts. Each 
part is first cast from iron in the casting shop and then sent to the finishing shop 
where holes are drilled, surfaces are turned, and edges are ground. The required 
worker-hours (per 100 units) for each of the parts of the two shops are shown 
below: 



Part 


1 2 3 4 5 


Casting 


2 13 3 1 


Finishing 3 2 2 1 1 



The profits from the parts are $30, $20, $40, $25, and $10 (per 100 units), 
respectively. The capacities of the casting and finishing shops over the next 
month are 700 and 1,000 worker-hours, respectively. Formulate the problem of 





2.6 Exercises 



29 



determining the quantities of each spare part to be made during the month so as 
to maximize profit. 

5. Convert the following problem to standard form and solve: 

maximize x\ + 4^2 + *3 
subject to 2x\ - 2x2 + *3 = 4 
xi - x 3 = 1 

X2 > 0, X 3 > 0. 



6. A large textile firm has two manufacturing plants, two sources of raw material, 
and three market centers. The transportation costs between the sources and the 
plants and between the plants and the markets are as follows: 

Flam 





A 


B 


Source ^ 


SI /ton 


$ 1,50/ton 


$ 2 /ton 


$ 1.50/ton 





] 


Market 

2 


3 


rt A 


$4/1 on 


$ 2 /ton 


$ 1 /ton 


Plant 

B 


$3/ton 


$4/ton 


$ 2 /ton 



Ten tons are available from source 1 and 15 tons from source 2. The three market 
centers require 8 tons, 14 tons, and 3 tons. The plants have unlimited processing 
capacity. 

(a) Formulate the problem of finding the shipping patterns from sources to 
plants to markets that minimizes the total transportation cost. 

(b) Reduce the problem to a single standard transportation problem with two 
sources and three destinations. (Hint: Find minimum cost paths from sources 
to markets.) 

(c) Suppose that plant A has a processing capacity of 8 tons, and plant B has 
a processing capacity of 7 tons. Show how to reduce the problem to two 
separate standard transportation problems. 

7. A businessman is considering an investment project. The project has a lifetime 
of 4 years, with cash flows of -$100,000, +$50,000, +$70,000, and +$30,000 
in each of the 4 years, respectively. At any time he may borrow funds at the 
rates of 12 %, 22 %, and 34 % (total) for 1, 2, or 3 periods, respectively. He may 
loan funds at 10% per period. He calculates the present value of a project as 
the maximum amount of money he would pay now, to another party, for the 
project, assuming that he has no cash on hand and must borrow and lend to pay 
the other party and operate the project while maintaining a nonnegative cash 
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balance after all debts are paid. Formulate the project valuation problem in a 
linear programming framework. 

8. Convert the following problem to a linear program in standard form: 

minimize \x\ + \y\ + |z| 
subject to v + y < 1 
2x + z = 3. 

9. A class of piecewise linear functions can be represented as /(x) = Maximum 
(c[x + d\, C 2 x + ^ 2 ? • • • , c T p x + d p ). For such a function /, consider the problem 

minimize fix) 
subject to Ax = b, x > 0. 

Show how to convert this problem to a linear programming problem. 

10. A small computer manufacturing company forecasts the demand over the next 
n months to be di, i = 1,2, . . . , n. In any month it can produce r units, using 
regular production, at a cost of b dollars per unit. By using overtime , it can 
produce additional units at c dollars per unit, where c > b. The firm can store 
units from month to month at a cost of s dollars per unit per month. Formulate 
the problem of determining the production schedule that minimizes cost. (Hint: 
See Exercise 9.) 

1 1 . Discuss the situation of a linear program that has one or more columns of the A 
matrix equal to zero. Consider both the case where the corresponding variables 
are required to be nonnegative and the case where some are free. 

12. Suppose that the matrix A = (ai, a 2 , . . . , a„) has rank m, and that for some 
p < m, ai, a 2 , . . ., a p are linearly independent. Show that m - p vectors 
from the remaining n - p vectors can be adjoined to form a set of m linearly 
independent vectors. 

13. Suppose that x is a feasible solution to the linear program (2.12), with A an 
mxn matrix of rank m. Show that there is a feasible solution y having the same 
value (that is, c T y = c T x ) and having at most m + 1 positive components. 

14. What are the basic solutions of Example 3, Sect. 2.5? 

15. Let S be a convex set in E n and S'* a convex set in E m . Suppose T is an m x n 
matrix that establishes a one-to-one correspondence between S and S *, i.e., for 
every s e S there is s* e S* such that Ts = s*, and for every s* e S* there is a 
single s e S such that Ts = s*. Show that there is a one-to-one correspondence 
between extreme points of S and S*. 

16. Consider the two linear programming problems in Example 1, Sect. 2.1, one 
in E n and the other in E n+m . Show that there is a one-to-one correspondence 
between extreme points of these two problems. 



References 
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Chapter 3 

The Simplex Method 



The idea of the simplex method is to proceed from one basic feasible solution (that 
is, one extreme point) of the constraint set of a problem in standard form to another, 
in such a way as to continually decrease the value of the objective function until a 
minimum is reached. The results of Chap. 2 assure us that it is sufficient to consider 
only basic feasible solutions in our search for an optimal feasible solution. This 
chapter demonstrates that an efficient method for moving among basic solutions to 
the minimum can be constructed. 

In the first five sections of this chapter the simplex machinery is developed from 
a careful examination of the system of linear equations that defines the constraints 
and the basic feasible solutions of the system. This approach, which focuses on 
individual variables and their relation to the system, is probably the simplest, but 
unfortunately is not easily expressed in compact form. In the last few sections of 
the chapter, the simplex method is viewed from a matrix theoretic approach, which 
focuses on all variables together. This more sophisticated viewpoint leads to a com- 
pact notational representation, increased insight into the simplex process, and to 
alternative methods for implementation. 



3.1 Pivots 

To obtain a firm grasp of the simplex procedure, it is essential that one first under- 
stand the process of pivoting in a set of simultaneous linear equations. There are two 
dual interpretations of the pivot procedure. 
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First Interpretation 

Consider the set of simultaneous linear equations 



CL\\X\ + a\ 2 X 2 + • • • + di n X n — b \ 
a 2 \Xi + a 22 X2 + . . . + a 2 n x n = b 2 



d m \X\ T d m 2 X 2 T . . . T CL mn X n — b m , 



where m < n. In matrix form we write this as 

Ax = b. 



(3.1) 



(3.2) 



In the space E n we interpret this as a collection of m linear relations that must be 
satisfied by a vector x. Thus denoting by a J the ith row of A we may express (3. 1) as: 

a*x = b\ 
a 2 x = b 2 



a m x = b m . 



(3.3) 



This corresponds to the most natural interpretation of (3.1) as a set of m equations. 

If m < n and the equations are linearly independent, then there is not a unique 
solution but a whole linear variety of solutions (see Appendix B). A unique solution 
results, however, if n-m additional independent linear equations are adjoined. For 
example, we might specify n-m equations of the form e k x = 0, where e k is the kth 
unit vector (which is equivalent to Xk = 0), in which case we obtain a basic solu- 
tion to (3.1). Different basic solutions are obtained by imposing different additional 
equations of this special form. 

If Eq. (3.3) are linearly independent, we may replace a given equation by any 
nonzero multiple of itself plus any linear combination of the other equations in the 
system. This leads to the well-known Gaussian reduction schemes, whereby mul- 
tiples of equations are systematically subtracted from one another to yield either a 
triangular or canonical form. It is well known, and easily proved, that if the first m 
columns of A are linearly independent, the system (3.1) can, by a sequence of such 
multiplications and subtractions, be converted to the following canonical form: 



X\ -\-a\(j n+ \}X m +\ + ^l(m+ 2 )^m +2 + ' ' ' + a\ n X n — d \ o 

x 2 +a 2 ( m + i)X m +i + a 2 (m+2)X m + 2 + • • • + a 2 n x n = a 2 Q 

X m T^m(m+l)-^m+l T ^m(m+2)^-m+2 + ' ' ' + a mn X n — d m Q. 



(3.4) 



3.1 Pivots 
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Corresponding to this canonical representation of the system, the variables x\, 
X 2 , • • . , x m are called basic and the other variables are nonbasic. The corresponding 
basic solution is then: 



X\ — 0io, X 2 — ^20? • • • 9 X m — X m + 1 — 0, . . . , X n — 0, 

or in vector form: x = (ao, 0 ) where ao is m - dimensional and 0 is the (n - m)- 
dimensional zero vector. 

Actually, we relax our definition somewhat and consider a system to be in canon- 
ical form if, among the n variables, there are m basic ones with the property that each 
appears in only one equation, its coefficient in that equation is unity, and no two of 
these m variables appear in any one equation. This is equivalent to saying that a 
system is in canonical form if by some reordering of the equations and the variables 
it takes the form (3.4). 

Also it is customary, from the dictates of economy, to represent the system (3.4) 
by its corresponding array of coefficients or tableau : 



X\ X 2 x 3 

1 0 0 

0 1 0 

0 0 1 



Xm X m + 1 X m +2 

b'Kjn+l) <2l(m+2) 
0 «2(m+l) ^2(m+2) 



X n 

ain aio 
<220 

(3.5) 



0 0 0 



<2m(m+ 1) <2 m ( m +2) 



&mn 



<2m0 



The question solved by pivoting is this: given a system in canonical form, suppose 
a basic variable is to be made nonbasic and a nonbasic variable is to be made basic; 
what is the new canonical form corresponding to the new set of basic variables? The 
procedure is quite simple. Suppose in the canonical system (3.4) we wish to replace 
the basic variable x p , 1 < p < ra, by the nonbasic variable x q . This can be done if 
and only if a pq is nonzero; it is accomplished by dividing row p by a pq to get a unit 
coefficient for x q in the pth equation, and then subtracting suitable multiples of row 
p from each of the other rows in order to get a zero coefficient for x q in all other 
equations. This transforms the qth column of the tableau so that it is zero except 
in its pth entry (which is unity) and does not affect the columns of the other basic 
variables. Denoting the coefficients of the new system in canonical form by a'-, we 
have explicitly 



a ij- a ‘j a„ a P- 



a . = ^ 

, PJ a pi 



i + p 



(3.6) 



Equation (3.6) are the pivot equations that arise frequently in linear programming. 
The element a pq in the original system is said to be the pivot element. 
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Example 1. Consider the system in canonical form: 

X\ + X 4 + *5 - X 6 = 5 

X 2 + 2X4 - 3X5 + X6 = 3 

X3 - X4 + 2X5 “ X6 = - 1 . 



Let us find the basic solution having basic variables X4, X5, x 6. We set up the coef- 
ficient array below: 



X\ 


*2 


A '3 X4 


x 5 x 6 




1 


0 


0 (D 


1 -1 


5 


0 


1 


0 2 


-3 1 


3 


0 


0 


1 -1 


2 -1 


-1 


The circle indicated is our first pivot element and corresponds to the replacement of 
x\ by X 4 as a basic variable. After pivoting we obtain the array 


Xl 


x 2 


x 3 x 4 


x 5 x 6 




1 


0 


0 1 


1 -1 


5 


-2 


1 


0 0 


@ 3 


-7 


1 


0 


1 0 


3 -2 


4 


and again we have circled the next pivot element indicating 


our intention to replace 


X 2 by X 5 . We then obtain 










Xi 


*2 


x 3 x 4 


x 5 x 6 




3/5 


1/5 


0 1 


0 -2/5 


18/5 


2/5 


- 1/5 


0 0 


1 -3/5 


7/5 


- 1/5 


3/5 


1 0 


0 @) 


- 1/5 


Continuing, there results 








Xl 


X2 


*3 X4 


x 5 x 6 




1 


-1 


-2 1 


0 0 


4 


1 


-2 


-3 0 


1 0 


2 


1 


-3 


-5 0 


0 1 


1 



From this last canonical form we obtain the new basic solution 

X 4 = 4, X 5 = 2, Xg = 1. 



Second Interpretation 

The set of simultaneous equations represented by (3.1) and (3.2) can be interpreted 
in E m as a vector equation. Denoting the columns of A by ai, a 2 , . . . , a n we write 
(3.1) as 

xiai + X 2 a 2 + • • • + x n a n = b. 

In this interpretation we seek to express b as a linear combination of the a/s. 



(3.7) 



3.1 Pivots 
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If m < n and the vectors a j span E m then there is not a unique solution but a 
whole family of solutions. The vector b has a unique representation, however, as 
a linear combination of a given linearly independent subset of these vectors. The 
corresponding solution with ( n - m ) Xj variables set equal to zero is a basic solution 
to (3.1). 

Suppose now that we start again with a system in the canonical form correspond- 
ing to the tableau: 



ai a2 a3 

1 0 0 

0 1 0 

0 0 1 



&m+ 1 &m+2 

0 CL\{m+\) ^l(m+2) 

0 ^2(m+l) ^2(m+2) 



a * 



b 

5io 

^20 



(3.8) 



0 0 0 



1 



dm(m+ 1) 



^m(m+ 2) 



^m0 



In this case the first m vectors form a basis. Furthermore, every other vector repre- 
sented in the tableau can be expressed as a linear combination of these basis vectors 
by simply reading the coefficients down the corresponding column. Thus 



a j — + ci 2 j-a 2 + • • • + a m j<i m . (3.9) 

The tableau can be interpreted as giving the representations of the vectors a 7 
in terms of the basis; the jth column of the tableau is the representation for the 
vector 2Lj. In particular, the expression for b in terms of the basis is given in the last 
column. 

We now consider the operation of replacing one member of the basis by another 
vector not already in the basis. Suppose for example we wish to replace the basis 
vector sl P9 1 < p < m, by the vector a q . Provided that the first m vectors with a p 
replaced by are linearly independent these vectors constitute a basis and every 
vector can be expressed as a linear combination of this new basis. To find the new 
representations of the vectors we must update the tableau. The linear independence 
condition holds if and only if a pq ± 0. 

Any vector a 7 can be expressed in terms of the old array through (3.9). For a q we 
have 

m 

^ v MiqSli + CL pq il p 

i= 1 

i±P 

from which we may solve for a p , 

. m - 

1 a i<l 

a p = — *q - > . —a;. 

a pq a Pq 

i*P 



(3.10) 
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Substituting (3.10) into (3.9) we obtain: 



i= 1 



Z l _ a iq _ \ a pj 

I a v ~ ^ a pj I a * + J a </- 

,= i \ u pq / 



(3.11) 



Denoting the coefficients of the new tableau, which give the linear combinations, 
by a\. we obtain immediately from (3.11) 




al a Pj ,lt P 



(3.12) 



These formulas are identical to (3.6). 

If a system of equations is not originally given in canonical form, we may put 
it into canonical form by adjoining the m unit vectors to the tableau and, starting 
with these vectors as the basis, successively replace each of them with columns of 
A using the pivot operation. 

Example 2. Suppose we wish to solve the simultaneous equations 



X\ + X2 ~ X3 = 5 
2x\ - 3x2 + *3 = 3 

-X\ + 2X2 - *3 = - 1 . 



To obtain an original basis, we form the augmented tableau 





ei 


e 2 


e3 


ai 


a 2 


a 3 


b 




1 


0 


0 


1 


1 


-1 


5 




0 


1 


0 


2 


-3 


1 


3 




0 


0 


1 


-1 


2 


-1 


-1 


and replace ei by ai, 
to those of Example 1 


e 2 by a 2 , 


and e 3 


by 


a 3 . The 


required operations are identical 



3.2 Adjacent Extreme Points 

In Chap. 2 it was discovered that it is only necessary to consider basic feasible solu- 
tions to the system 



Ax = b, x>0 (3.13) 

when solving a linear program, and in the previous section it was demonstrated that 
the pivot operation can generate a new basic solution from an old one by replacing 
one basic variable by a nonbasic variable. It is clear, however, that although the pivot 
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operation takes one basic solution into another, the nonnegativity of the solution will 
not in general be preserved. Special conditions must be satisfied in order that a pivot 
operation maintain feasibility. In this section we show how it is possible to select 
pivots so that we may transfer from one basic feasible solution to another. 

We show that although it is not possible to arbitrarily specify the pair of vari- 
ables whose roles are to be interchanged and expect to maintain the nonnegativity 
condition, it is possible to arbitrarily specify which nonbasic variable is to become 
basic and then determine which basic variable should become nonbasic. As is con- 
ventional, we base our derivation on the vector interpretation of the linear equations 
although the dual interpretation could alternatively be used. 



Nondegeneracy Assumption 



Many arguments in linear programming are substantially simplified upon the intro- 
duction of the following. 

Nondegeneracy Assumption: Every basic feasible solution of (3.13) is a nondegenerate 
basic feasible solution. 

This assumption is invoked throughout our development of the simplex method, 
since when it does not hold the simplex method can break down if it is not suitably 
amended. The assumption, however, should be regarded as one made primarily for 
convenience, since all arguments can be extended to include degeneracy, and the 
simplex method itself can be easily modified to account for it. 



Determination of Vector to Leave Basis 

Suppose we have the basic feasible solution x = (x\, X 2 , . . . , x m , 0, 0, . . . , 0) or, 
equivalently, the representation 



viai + X 2^2 + • • • + x m a m = b. (3.14) 

Under the nondegeneracy assumption, xj > 0, i = 1,2, . . . , m. Suppose also that 
we have decided to bring into the representation the vector a q , q > m. We have 
available a representation of a q in terms of the current basis 

a q — + a2 q &2 + * * * + d, mq a m . (3.15) 

Multiplying (3.15) by a variable s > 0 and subtracting from (3.14), we have 

C*i - eai q )SL\ + (X 2 - sa 2q )a 2 + ■ ■ • + (x m - sa mq )a m + sa q = b. (3.16) 
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Thus, for any s > 0 (3.16) gives b as a linear combination of at most m + 1 vectors. 
For s = 0 we have the old basic feasible solution. As s is increased from zero, 
the coefficient of a q increases, and it is clear that for small enough s, (3.16) gives 
a feasible but nonbasic solution. The coefficients of the other vectors will either 
increase or decrease linearly as s is increased. If any decrease, we may set s equal 
to the value corresponding to the first place where one (or more) of the coefficients 
vanishes. That is 

s = min {xj/ai q : > 0}. (3.17) 

i 

In this case we have a new basic feasible solution, with the vector a q replacing the 
vector a p , where p corresponds to the minimizing index in (3. 17). If the minimum in 
(3.17) is achieved by more than a single index i, then the new solution is degenerate 
and any of the vectors with zero component can be regarded as the one that left the 
basis. 

If none of the at q s are positive, then all coefficients in the representation (3.16) 
increase (or remain constant) as £ is increased, and no new basic feasible solution is 
obtained. We observe, however, that in this case, where none of the a iq s are positive, 
there are feasible solutions to (3.13) having arbitrarily large coefficients. This means 
that the set K of feasible solutions to (3.13) is unbounded, and this special case, as 
we shall see, is of special significance in the simplex procedure. 

In summary, we have deduced that, given a basic feasible solution and an arbi- 
trary vector SL q , there is either a new basic feasible solution having a q in its basis and 
one of the original vectors removed, or the set of feasible solutions is unbounded. 

Let us consider how the calculation of this section can be displayed in our 
tableau. We assume that corresponding to the constraints 

Ax = b, x ^ 0, 

we have a tableau of the form (3.8). Note that the tableau may be the result of sev- 
eral pivot operations applied to the original tableau, but in any event, it represents a 
solution with basis ai, a 2 , . . . , a m . We assume that fiio, ^ 20 , • • • , ^mo are nonneg- 
ative, so that the corresponding basic solution x\ = fiio, *2 = < 220 , • > x m = ^mo is 

feasible. We wish to bring into the basis the vector a q , q > m, and maintain feasibil- 
ity. In order to determine which element in the qth column to use as the pivot (and 
hence which vector in the basis will leave), we use (3.17) and compute the ratios 
Xi/aiq = aio/a iq , i = 1,2, . . . , m, select the smallest nonnegative ratio, and pivot on 
the corresponding a iq . 

Example 3. Consider the system 

ai a2 a3 ^ as b 

1 0 0 2 4 6 4 

0 10 12 3 3 

0 0 1-12 11 



3 . 2 Adj acent Extreme Points 
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which has basis ai, a 2 , 213 yielding a basic feasible solution x = (4, 3, 1, 0, 0, 0). 
Suppose we elect to bring 214 into the basis. To determine which element in the 
fourth column is the appropriate pivot, we compute the three ratios: 

4/2 = 2, 3/1 = 3, 1/ - 1 = -1 



and select the smallest nonnegative one. This gives 2 as the pivot element. The new 
tableau is 



ai a2 a3 a4 as 

1/2 0 0 1 2 

- 1/2 10 0 0 

1/20104 



a 6 b 

3 2 
0 1 

4 3 



with corresponding basic feasible solution x = (0, 1, 3, 2, 0, 0). 



Our derivation of the method for selecting the pivot in a given column that will 
yield a new feasible solution has been based on the vector interpretation of the equa- 
tion Ax = b. An alternative derivation can be constructed by considering the dual 
approach that is based on the rows of the tableau rather than the columns. Briefly, 
the argument runs like this: if we decide to pivot on a pq , then we first divide the pth 
row by the pivot element a pq to change it to unity. In order that the new a p 0 remain 
positive, it is clear that we must have a pq > 0. Next we subtract multiples of the 
pth row from each other row in order to obtain zeros in the qth column. In this pro- 
cess the new elements in the last column must remain nonnegative — if the pivot was 
properly selected. The full operation is to subtract, from the ith row, ai q la pq times 
the pth row. This yields a new solution obtained directly from the last column: 

/ ^iq 

X { = X( - —x p . 
a pq 

For this to remain nonnegative, it follows that x p la pq < Xi/aiq, and hence again we 
are led to the conclusion that we select p as the index i minimizing Xi/a^. 



Geometrical Interpretations 



Corresponding to the two interpretations of pivoting and extreme points developed 
algebraically, are two geometrical interpretations. The first is in activity space , the 
space where x is represented. This is perhaps the most natural space to consider, and 
it was used in Sect. 2.5. Here the feasible region is shown directly as a convex set, 
and basic feasible solutions are extreme points. Adjacent extreme points are points 
that lie on a common edge. 

The second geometrical interpretation is in requirements space , the space where 
the columns of A and b are represented. The fundamental relation is 



2L\X\ + a2^2 + • • • + 2L n X n = b. 
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Fig. 3.1 Constraint representation in requirements space 



An example for m = 2, n = 4 is shown in Fig. 3.1. A feasible solution defines a 
representation of b as a positive combination of the a*’s. A basic feasible solution 
will use only m positive weights. In the figure a basic feasible solution can be con- 
structed with positive weights on ai and a 2 because b lies between them. A basic 
feasible solution cannot be constructed with positive weights on ai and £ 4 . Suppose 
we start with ai and a 2 as the initial basis. Then an adjacent basis is found by bring- 
ing in some other vector. If is brought in, then clearly a 2 must go out. On the 
other hand, if 214 is brought in, ai must go out. 



3.3 Determining a Minimum Feasible Solution 

In the last section we showed how it is possible to pivot from one basic feasible 
solution to another (or determine that the solution set is unbounded) by arbitrarily 
selecting a column to pivot on and then appropriately selecting the pivot in that 
column. The idea of the simplex method is to select the column so that the resulting 
new basic feasible solution will yield a lower value to the objective function than 
the previous one. This then provides the final link in the simplex procedure. By an 
elementary calculation, which is derived below, it is possible to determine which 
vector should enter the basis so that the objective value is reduced, and by another 
simple calculation, derived in the previous section, it is possible to then determine 
which vector should leave in order to maintain feasibility. 

Suppose we have a basic feasible solution 

(Xb, 0) = (flio, ^20, • • • , 0, 0, . • • , 0) 

together with a tableau having an identity matrix appearing in the first m columns 
as shown in tableau (3.8). The value of the objective function corresponding to any 
solution x is 



Z — C\X\ + C2X2 + • • • + c n x n , 



(3.18) 



3.3 Determining a Minimum Feasible Solution 
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and hence for the basic solution, the corresponding value is 

£0 = CbX B , (3.19) 

where c B = [ci, c 2 , . . . , c m ]. 

Although it is natural to use the basic solution (xb, 0 ) when we have the tableau 
(3.8), it is clear that if arbitrary values are assigned to x m+ \ 9 x m+ 2 , . . . , x n , we can 
easily solve for the remaining variables as 



n 

X\ =aio- aijXj 
j=m + 1 
n 

x 2 =a 20 ~ L a 2j Xj 
j=m + 1 



X m — 



2 



j=m + 1 






(3.20) 



Using (3.20) we may eliminate xi, * 2 , . . . , x m from the general formula (3.18). 
Doing this we obtain 



Z — C X — £o + (c m +i Zm+l)-^m+l 

"^(^m+2 — Zm+2)X m +2 + ' ' ' + (c n — Zn)X n (3.21) 

where 

Zj = fliy-ci + «2 7 C2 + • • • + a m jC m , m+ 1 < j < n, ( 3 . 22 ) 

which is the fundamental relation required to determine the pivot column. The imp- 
ortant point is that this equation gives the values of the objective function z for 
any solution of Ax = b in terms of the variables x m+ u . . . , x n . From it we can 
determine if there is any advantage in changing the basic solution by introducing 
one of the nonbasic variables. For example, if cj - zj is negative for some j 9 m + 1 < 
j < n, then increasing xj from zero to some positive value would decrease the total 
cost, and therefore would yield a better solution. The formula (3.21) and (3.22) 
automatically take into account the changes that would be required in the values of 
the basic variables x\ 9 X 2 , . . . , x m to accommodate the change in Xj. 

Let us derive these relations from a different viewpoint. Let a j be the jth column 
of the tableau. Then any solution satisfies 

X\Ci + X2&2 + ' ' ' + X m e m — U() — X m+ ia m+ i — X m +2&m+2 — ’ ’ ’ — x n si n . 

Taking the inner product of this vector equation with Cg, we have 

m n 

L c j x j = c b s o - L z j x J’ 

i= 1 j=m + 1 
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where zj - c^a 7 . Thus, adding 2 c j x j 1° both sides, 

j=m + 1 

n 

c T x = Zo + L (c,- - Zj)xy (3.23) 

j=m + 1 

as before. 

We now state the condition for improvement, which follows easily from the 
above observation, as a theorem. 

Theorem ( Improvement of Basic Feasible Solution ). Given a nondegenerate basic fea- 
sible solution with corresponding objective value zo, suppose that for some j there holds 
Cj - Zj < 0. Then there is a feasible solution with objectivevalue z < Zo- If the column a j can 
be substituted for some vector in the originalbasis to yield a new basic feasible solution, 
this new solution will have z < Zo • If &j cannot be substituted to yield a basic feasible solu- 
tion, then the solutions et K is unbounded and the objective function can be made arbitrarily 
small ( toward minus infinity). 

Proof The result is an immediate consequence of the previous discussion. Let 
(vi, X 2 , . . . , x m , 0, 0, . . . , 0) be the basic feasible solution with objective value 
Zo and suppose c m+ \ - z m + 1 < 0- Then, in any case, new feasible solutions can be 
constructed of the form (vj, x' 2 , . . . , x ' m , x' m+v 0, 0, . . . , 0) with x' m+l > 0. Substi- 
tuting this solution in (3.21) we have 

Z-Zo = (c m + 1 — Zm+\) x m +\ ^ 0? 

and hence z < Zo for any such solution. It is clear that we desire to make x' m+l as large 
as possible. As x' m+l is increased, the other components increase, remain constant, 
or decrease. Thus x' m+l can be increased until one x\ - 0, i < m, in which case 
we obtain a new basic feasible solution, or if none of the x'ds decrease, x' m+l can 
be increased without bound indicating an unbounded solution set and an objective 
value without lower bound. I 

We see that if at any stage cj — Zj < 0 for some j , it is possible to make Xj 
positive and decrease the objective function. The final question remaining is whether 
cj - Zj > 0 for all j implies optimality. 

Optimality Condition Theorem. If for some basic feasible solution Cj—Zj > 0 for all j , then 
that solution is optimal. 

Proof. This follows immediately from (3.21), since any other feasible solution must 
have Xi > 0 for all /, and hence the value z of the objective will satisfy z - Zo > 0. 1 

Since the constants Cj - Zj play such a central role in the development of the 
simplex method, it is convenient to introduce the somewhat abbreviated notation 
rj = cj - Zj and refer to the r/s as the relative cost coefficients or, alternatively, the 
reduced cost coefficients (both terms occur in common usage). These coefficients 
measure the cost of a variable relative to a given basis. (For notational convenience 
we extend the definition of relative cost coefficients to basic variables as well; the 
relative cost coefficient of a basic variable is zero.) 
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We conclude this section by giving an economic interpretation of the relative cost 
coefficients. Let us agree to interpret the linear program 

minimize c T x 
subject to Ax = b, x > 0 

as a diet problem (see Sect. 2.2) where the nutritional requirements must be met 
exactly. A column of A gives the nutritional equivalent of a unit of a particular food. 
With a given basis consisting of, say, the first m columns of A, the corresponding 
simplex tableau shows how any food (or more precisely, the nutritional content of 
any food) can be constructed as a combination of foods in the basis. For instance, 
if carrots are not in the basis we can, using the description given by the tableau, 
construct a synthetic carrot which is nutritionally equivalent to a carrot, by an app- 
ropriate combination of the foods in the basis. 

In considering whether or not the solution represented by the current basis is 
optimal, we consider a certain food not in the basis — say carrots — and determine if 
it would be advantageous to bring it into the basis. This is very easily determined 
by examining the cost of carrots as compared with the cost of synthetic carrots. If 
carrots are food /, then the unit cost of carrots is Cj. The cost of a unit of synthetic 
carrots is, on the other hand, 

m 

Zj = Yj 

i= 1 

If rj = cj - Zj < 0, it is advantageous to use real carrots in place of synthetic carrots, 
and carrots should be brought into the basis. 

In general each Zj can be thought of as the price of a unit of the column a j when 
constructed from the current basis. The difference between this synthetic price and 
the direct price of that column determines whether that column should enter the 
basis. 



3.4 Computational Procedure: Simplex Method 

In previous sections the theory, and indeed much of the technique, necessary for 
the detailed development of the simplex method has been established. It is only 
necessary to put it all together and illustrate it with examples. 

In this section we assume that we begin with a basic feasible solution and that the 
tableau corresponding to Ax = b is in the canonical form for this solution. Methods 
for obtaining this first basic feasible solution, when one is not obvious, are described 
in the next section. 

In addition to beginning with the array Ax = b expressed in canonical form 
corresponding to a basic feasible solution, we append a row at the bottom consisting 
of the relative cost coefficients and the negative of the current cost. The result is a 
simplex tableau. 
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Thus, if we assume the basic variables are (in order) x\, * 2 , . . . , x m , the simplex 
tableau takes the initial form shown in Fig. 3.2. 

The basic solution corresponding to this tableau is 



0 < i < m 
0 m + 1 < i < n 



which we have assumed is feasible, that is, a® > 0, i = 1, 2, . . . , m. The corre- 
sponding value of the objective function is zo- 



ai 


a 2 


a m 


a m +i 


a m +2 


• • • 


a n 


b 


1 


0 • 


0 




a\(m+ 2) 


• • aij • 


C\n 


a \ 0 


0 


1 


0 


«2(m+l) 


«2(m+2) 


a,2j 


&2n 


a\o 


0 


0 


0 


&i(m+ 1) 


C-i(m+ 2) 


CLij 


C-in 


<2/0 



0 


0 


1 


1) 


<2m(m+ 2) 


CL m j 


C/nn 


a m0 


0 


0 


0 


r m + 1 


7*m+ 2 


. . r . . 


r n 


-zo 



Fig. 3.2 Canonical simplex tableau 



The relative cost coefficients rj indicate whether the value of the objective will 
increase or decrease if xj is pivoted into the solution. If these coefficients are all 
nonnegative, then the indicated solution is optimal. If some of them are negative, an 
improvement can be made (assuming nondegeneracy) by bringing the correspond- 
ing component into the solution. When more than one of the relative cost coefficients 
is negative, any one of them may be selected to determine in which column to pivot. 
Common practice is to select the most negative value. (See Exercise 13 for further 
discussion of this point.) 

Some more discussion of the relative cost coefficients and the last row of the 
tableau is warranted. We may regard z as an additional variable and 

C\X\ + C 2 X 2 + • • • + c n x n -z = 0 

as another equation. A basic solution to the augmented system will have m + 1 basic 
variables, but we can require that z be one of them. For this reason it is not neces- 
sary to add a column corresponding to z, since it would always be (0, 0, . . . , 0, 1). 
Thus, initially, a last row consisting of the c/s and a right-hand side of zero can be 
appended to the standard array to represent this additional equation. Using standard 
pivot operations, the elements in this row corresponding to basic variables can be 
reduced to zero. This is equivalent to transforming the additional equation to the 
form 



^m+l-^m+l ^m+2^m+2 + ' ' ' + Z — Zo* 



(3.24) 
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This must be equivalent to (3.23), and hence the r/s obtained are the relative cost 
coefficients. Thus, the last row can be treated operationally like any other row: just 
start with c/s and reduce the terms corresponding to basic variables to zero by row 
operations. 

After a column q is selected in which to pivot, the final selection of the pivot 
element is made by computing the ratio a® / for the positive elements a^, i = 
1,2, . . . , m, of the qth column and selecting the element p yielding the minimum 
ratio. Pivoting on this element will maintain feasibility as well as (assuming nonde- 
generacy) decrease the value of the objective function. If there are ties, any element 
yielding the minimum can be used. If there are no nonnegative elements in the col- 
umn, the problem is unbounded. After updating the entire tableau with a pq as pivot 
and transforming the last row in the same manner as all other rows (except row q ), 
we obtain a new tableau in canonical form. The new value of the objective function 
again appears in the lower right-hand corner of the tableau. 

The simplex algorithm can be summarized by the following steps: 

Step 0. Form a tableau as in Fig. 3.2 corresponding to a basic feasible solution. 

The relative cost coefficients can be found by row reduction. 

Step 1. If each r 7 ^ 0, stop; the current basic feasible solution is optimal. 

Step 2. Select q such that r q < 0 to determine which nonbasic variable is to be- 
come basic. 

Step 3. Calculate the ratios for a iq >0, i = 1,2, . . . , m. If no a iq > 0, 

stop; the problem is unbounded. Otherwise, select p as the index i corresponding 
to the minimum ratio. 

Step 4. Pivot on the pqth element, updating all rows including the last. Return to 
Step 1. 

Proof that the algorithm solves the problem (again assuming nondegeneracy) is 
essentially established by our previous development. The process terminates only 
if optimality is achieved or unboundedness is discovered. If neither condition is 
discovered at a given basic solution, then the objective is strictly decreased. Since 
there are only a finite number of possible basic feasible solutions, and no basis 
repeats because of the strictly decreasing objective, the algorithm must reach a basis 
satisfying one of the two terminating conditions. 

Example 1. Maximize 3x\ + X 2 + 3^3 subject to 

2 xi + X2 + X3 < 2 

x\ + 2x2 + 3^3 < 5 
2xi + 2x2 + X3 < 6 
xi >0, X2> 0 , X 3 > 0 . 



To transform the problem into standard form so that the simplex procedure can be 
applied, we change the maximization to minimization by multiplying the objective 
function by minus one, and introduce three nonnegative slack variables X4, X5, X6. 
We then have the initial tableau 
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ai a2 a3 a4 as b 

( 2 ) 0 I I 0 0 2 

1 2 © 0 1 0 5 

2 2 1 0 0 1 6 

r 7 ^3 -1 ^3 0 0 0 0 

First tableau 

The problem is already in canonical form with the three slack variables serving as 
the basic variables. We have at this point r 7 = cj-zj - Cj , since the costs of the slacks 
are zero. Application of the criterion for selecting a column in which to pivot shows 
that any of the first three columns would yield an improved solution. In each of these 
columns the appropriate pivot element is determined by computing the ratios a® /aij 
and selecting the smallest positive one. The three allowable pivots are all circled 
on the tableau. It is only necessary to determine one allowable pivot, and normally 
we would not bother to calculate them all. For hand calculation on problems of this 
size, however, we may wish to examine the allowable pivots and select one that will 
minimize (at least in the short run) the amount of division required. Thus for this 
example we select the second column and result in: 

2 11 10 0 2 

-30®-2101 
-2 0 -1 -2 0 1 2 

- 10-2 1 0 0 2 

Second tableau 

We note that the objective function — we are using the negative of the original one — 
has decreased from zero to minus two. We now pivot on 0. 

® 1 0 3 -10 1 

-3 0 1-2 1 0 1 

-5 0 0 -4 1 1 3 

-7 0 0 -3 2 0 4 

Third tableau 

The value of the objective function has now decreased to minus four and we may 
pivot in either the first or fourth column. We select (5). 

1 1/5 0 3/5 -1/5 0 1/5 

0 3/5 1 -1/5 2/5 0 8/5 

0 10-1 014 

0 7/5 0 6/5 3/5 0 27/5 

Fourth tableau 

Since the last row has no negative elements, we conclude that the solution corre- 
sponding to the fourth tableau is optimal. Thus x\ = 1/5, X 2 = 0, V 3 = 8/5, X 4 = 
0, V 5 = 0, X 6 = 4 is the optimal solution with a corresponding value of the (nega- 
tive) objective of -(27 /5). 




3.5 Finding a Basic Feasible Solution 
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Degeneracy 



It is possible that in the course of the simplex procedure, degenerate basic feasible 
solutions may occur. Often they can be handled as a nondegenerate basic feasible 
solution. However, it is possible that after a new column q is selected to enter the ba- 
sis, the minimum of the ratios dio/di q may be zero, implying that the zero-valued ba- 
sic variable is the one to go out. This means that the new variable x q will come in at 
zero value, the objective will not decrease, and the new basic feasible solution will 
also be degenerate. Conceivably, this process could continue for a series of steps 
until, finally, the original degenerate solution is again obtained. The result is a cycle 
that could be repeated indefinitely. 

Methods have been developed to avoid such cycles (see Exercises 15-17 for a 
full discussion of one of them, which is based on perturbing the problem slightly 
so that zero- valued variables are actually small positive values, and Exercise 32 for 
Bland’s rule, which is simpler). In practice, however, such procedures are found to 
be unnecessary. When degenerate solutions are encountered, the simplex procedure 
generally does not enter a cycle. However, anticycling procedures are simple, and 
many codes incorporate such a procedure for the sake of safety. 



3.5 Finding a Basic Feasible Solution 

A basic feasible solution is sometimes immediately available for linear programs. 
For example, in problems with constraints of the form 

Ax < b, x ^ 0 (3.25) 

with b > 0 , a basic feasible solution to the corresponding standard form of the 
problem is provided by the slack variables. This provides a means for initiating the 
simplex procedure. The example in the last section was of this type. An initial basic 
feasible solution is not always apparent for other types of linear programs, how- 
ever, and it is necessary to develop a means for determining one so that the simplex 
method can be initiated. Interestingly (and fortunately), an auxiliary linear program 
and corresponding application of the simplex method can be used to determine the 
required initial solution. 

By elementary straightforward operations the constraints of a linear program- 
ming problem can always be expressed in the form 



Ax = b. x > 0 



(3.26) 
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with b > 0. In order to find a solution to (3.26) consider the artificial minimization 
problem 

m 

minimize ^ Uj 

i= 1 

subject to Ax + u = b (3.27) 

x ^ 0 , u ^ 0 

where u = (u\, U 2 , . . . , u m ) is a vector of artificial variables. If there is a feasible 
solution to (3.26), then it is clear that (3.27) has a minimum value of zero with u = 0. 
If (3.26) has no feasible solution, then the minimum value of (3.27) is greater than 
zero. 

Now (3.27) is itself a linear program in the variables x, u, and the system is 
already in canonical form with basic feasible solution u = b. If (3.27) is solved 
using the simplex technique, a basic feasible solution is obtained at each step. If the 
minimum value of (3.27) is zero, then the final basic solution will have all Uj = 0, 
and hence barring degeneracy, the final solution will have no Uj variables basic. If in 
the final solution some uj are both zero and basic, indicating a degenerate solution, 
these basic variables can be exchanged for nonbasic xj variables (again at zero level) 
to yield a basic feasible solution involving x variables only. (However, the situation 
is more complex if A is not of full rank. See Exercise 21 .) 

Example 1. Find a basic feasible solution to 

2xi + %2 + 2x3 = 4 
3xi + 3x2 + X 3 = 3 

Xi > 0, X 2 > 0, X 3 > 0. 

We introduce artificial variables X 4 ^ 0, X 5 > 0 and an objective function X 4 + X 5 . 
The initial tableau is 

X\ X 2 X 3 X 4 X 5 b 

2 I 2 I 0 4 

3 3 10 13 

c 7 0 0 0 I I 0 

Initial tableau 

A basic feasible solution to the expanded system is given by the artificial variables. 
To initiate the simplex procedure we must update the last row so that it has zero 
components under the basic variables. This yields: 

2 1 2 10 4 

® 3 10 13 

r T -5 -4 -3 0 0 -7 

First tableau 
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Pivoting in the column having the most negative bottom row component as indi- 
cated, we obtain: 



0 


1 @ 1 


-2/3 


2 


1 


1 1/3 0 


1/3 


1 


0 


1 -4/3 0 


5/3 


-2 




Second tableau 







In the second tableau there is only one choice for pivot, and it leads to the final 



taoieau snown. 


0 


- 3/4 


1 


3/4 


- 1/2 


3/2 




1 


5/4 


0 


- 1/4 


1/2 


1/2 




0 


0 


0 


1 


1 


0 








Final tableau 







Both of the artificial variables have been driven out of the basis, thus reducing the 
value of the objective function to zero and leading to the basic feasible solution to 
the original problem 

x\ = 1/2, X 2 = 0, X 3 = 3/2. 

Using artificial variables, we attack a general linear programming problem by 
use of the two-phase method. This method consists simply of a phase I in which 
artificial variables are introduced as above and a basic feasible solution is found 
(or it is determined that no feasible solutions exist); and a phase TJ in which, using 
the basic feasible solution resulting from phase I, the original objective function 
is minimized. During phase II the artificial variables and the objective function of 
phase I are omitted. Of course, in phase I artificial variables need be introduced only 
in those equations that do not contain slack variables. 

Example 2. Consider the problem 



minimize 4xi + X 2 + *3 
subject to 2x\ + X 2 + 2 x 3 = 4 
3x\ + 3x2 + *3 = 3 
x\ > 0, X 2 > 0 , X 3 > 0 . 



There is no basic feasible solution apparent, so we use the two-phase method. The 
first phase was done in Example 1 for these constraints, so we shall not repeat it 
here. We give only the final tableau with the columns corresponding to the artificial 
variables deleted, since they are not used in phase II. We use the new cost function 
in place of the old one. Temporarily writing c T in the bottom row we have 

Xi X2 X3 b 

0 -3/4 1 3/2 

1 5/4 0 1/2 

c T 4 1 10 

Initial tableau 
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Transforming the last row so that zeros appear in the basic columns, we have 



0 


-3/4 


1 


3/2 


1 @ 


0 


1/2 


0 -13/4 

First tableau 


0 


- 7/2 


3/5 


0 


1 


9/5 


4/5 


1 


0 


2/5 


13/5 


0 


0 


-11/5 



Second tableau 



and hence the optimal solution is x\ =0, x 2 - 2/5, *3 =9/5. 
Example 3 (A Free Variable Problem). 

minimize —2x\ + 4x2 + 7x3 + X4 + 5xs 
subject to -x\ + X2 + 2x3 + M + 2x5 = 7 
—x\ + 2x2 + 3x3 + X4 + X5 = 6 
—X\ + X2 + X3 + 2x4 + X5 = 4 
x\ free, X2 ^ 0, X3 > 0, X4 > 0, X5 > 0. 



Since x\ is free, it can be eliminated, as described in Chap. 2, by solving for x\ 
in terms of the other variables from the first equation and substituting everywhere 
else. This can all be done with the simplex tableau as follows: 



c 



T 



X\ X2 X3 X4 X5 

-©1212 
-12311 
-11121 
-2 4 7 1 5 

Initial tableau 



b 

7 

6 

4 

0 



We select any nonzero element in the first column to pivot on — this will eliminate x \ . 



1 

0 

0 

0 



-1 -2 -1 -2 -7 

”1 T 0 ~\ = T 

0-1 1-1 -3 

2 3-11 -14 



Equivalent problem 



We now save the first row for future reference, but our linear program only in- 
volves the sub-tableau indicated. There is no obvious basic feasible solution for this 
problem, so we introduce artificial variables X6 and X7. 



x 2 


*3 


X4, 


X5 


x 6 


x 7 


b 


-1 


-1 


0 


1 


1 


0 


1 


0 


1 


-1 


1 


0 


1 


3 


0 


0 


0 


0 


1 


1 


0 


Initial tableau 


for phase I 







3.5 Finding a Basic Feasible Solution 



53 



Transforming the last row appropriately we obtain 





*2 


*3 


X 4 


*5 


*6 


*7 


b 




-1 


-1 


0 


© 


1 


0 


1 




0 


1 


-1 


1 


0 


1 


3 


r T 


1 


0 


1 


-2 


0 


0 


-4 






First tableau — 


phase I 






x 2 


*3 


X 4 


*5 


*6 


x 7 


b 




-1 


-1 


0 


1 


1 


0 


1 




© 


2 


-1 


0 


-1 


1 


2 




-1 


-2 


1 


0 


2 


0 


-2 








Second tableau- 


—phase I 






0 


1 


-1 


1 


0 


1 


3 




1 


2 


-1 


0 


-1 


1 


2 




0 


0 


0 


0 


1 


1 


0 





Final tableau — phase I 



Now we go back to the equivalent reduced problem 

X2 X3 X4 X5 b 

01-11 3 

12-10 2 
c T 2 3 -1 1 -14 

Initial tableau — phase II 

Transforming the last row appropriately we proceed with: 

0 1-113 

0 © -1 0 2 

0 -2 2 0 -21 

First tableau — phase II 

- 1/2 0 - 1/2 1 2 

1/2 1 - 1/2 0 1 

1 0 1 0 -19 

Final tableau — phase II 

The solution *3 = 1 , X 5 = 2 can be inserted in the expression for x\ giving 

xi = -7 + 2 - 1 + 2-2 = - 1 ; 

thus the final solution is 

X\ — — 1, Xj = 0, V3 = 1 , X4 = 0, X5 = 2. 
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3.6 Matrix Form of the Simplex Method 

Although the elementary pivot transformations associated with the simplex method 
are in many respects most easily discernible in the tableau format, with attention 
focused on the individual elements, there is much insight to be gained by studying 
a matrix interpretation of the procedure. The vector-matrix relationships that exist 
between the various rows and columns of the tableau lead, however, not only to 
increased understanding but also, in a rather direct way, to the revised simplex pro- 
cedure which in many cases can result in considerable computational advantage. 
The matrix formulation is also a natural setting for the discussion of dual linear 
programs and other topics related to linear programming. 

A preliminary observation in the development is that the tableau at any point in 
the simplex procedure can be determined solely by a knowledge of which variables 
are basic. As before we denote by B the submatrix of the original A matrix consist- 
ing of the m columns of A corresponding to the basic variables. These columns are 
linearly independent and hence the columns of B form a basis for E m . We refer to B 
as the basis matrix. 

As usual, let us assume that B consists of the first m columns of A. Then by 
partitioning A, x, and c T as 



The basic solution, which we assume is also feasible, corresponding to the basis 
B is x = (xb, 0) where xg = B -1 b. The basic solution results from setting xd = 0. 
However, for any value of xp the necessary value of xb can be computed from 
(3.28) as 

x B = B^b - B _ 1 Dx d , (3.29) 

and this general expression when substituted in the cost function yields 



A = [B, D] 

X = (x B , X D ), c r = [eg, Cp] , 



the standard linear program becomes 



minimize c^xb + c^xd 
subject to Bxb + Dxd = b 



(3.28) 



x B > 0, x D ^ 0. 



z = Cg(B *b - B 'Dxd) + CpX D 
= c^B^b + (c£ - 4B 1 D)x d , 



(3.30) 



which expresses the cost of any solution to (3.28) in terms of xd. Thus 




(3.31) 



is the relative cost vector (for nonbasic variables). It is the components of this vector 
that are used to determine which vector to bring into the basis. 
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Having derived the vector expression for the relative cost it is now possible to 
write the simplex tableau in matrix form. The initial tableau takes the form 



" A ! 

i 


b 




B 


! D ! 

i i 


b 


1 

c ! 


0 




T 

C B 


1 1 

! c T ! 

H) 1 


0 



which is not in general in canonical form and does not correspond to a point in the 
simplex procedure. If the matrix B is used as a basis, then the corresponding tableau 
becomes 



I 


! B -1 D ! 


B _1 b 


0 


! Cp - CgB _1 D ! 


-CgB _1 b 



(3.33) 



which is the matrix form we desire. 



*The Revised Simplex Method and LU Decomposition 

Extensive experience with the simplex procedure applied to problems from various 
fields, and having various values of n and m, has indicated that the method can be 
expected to converge to an optimum solution in about m, or perhaps 3m/2, pivot 
operations. (Except in the worst case. See Chap. 5.) Thus, particularly if m is much 
smaller than n , that is, if the matrix A has far fewer rows than columns, pivots will 
occur in only a small fraction of the columns during the course of optimization. 

Since the other columns are not explicitly used, it appears that the work expended 
in calculating the elements in these columns after each pivot is, in some sense, 
wasted effort. The revised simplex method is a scheme for ordering the compu- 
tations required of the simplex method so that unnecessary calculations are avoided. 
In fact, even if pivoting is eventually required in all columns, but m is small com- 
pared to n , the revised simplex method can frequently save computational effort. 

The revised form of the simplex method is this: Given the inverse B -1 of a current 
basis, and the current solution xb = ao = B - 1 b, 

Step 1. Calculate the current relative cost coefficients - CgB _1 D. This 

can best be done by first calculating y T = c^B -1 and then the relative cost vector 
r D = c d ~~ If r D > 0 stop; the current solution is optimal. 

Step 2. Determine which vector a q is to enter the basis by selecting the most 
negative cost coefficient; and calculate a q = B _1 a^ which gives the vector a q 
expressed in terms of the current basis. 

Step 3. If no a iq > 0, stop; the problem is unbounded. Otherwise, calculate the 
ratios a^jaiq for a iq > 0 to determine which vector is to leave the basis. 

Step 4. Update B -1 and the current solution B“'b. Return to Step 1. 

Updating of B 1 is accomplished by the usual pivot operations applied to an array 
consisting of B -1 and a q , where the pivot is the appropriate element in sl c/ . Of course 
B -1 b may be updated at the same time by adjoining it as another column. 
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One may go one step further in the matrix interpretation of the simplex method 
and note that execution of a single simplex cycle is not explicitly dependent on 
having B -1 but rather on the ability to solve linear systems with B as the coefficient 
matrix. A decomposition of B = LU can be updated where L is a lower triangular 
matrix and U is an upper triangular matrix. Then each of the linear systems can be 
solved by solving two triangular systems. 



3.7 Simplex Method for Transportation Problems 



The transportation problem was stated briefly in Chap. 2. We restate it here. There 
are m origins that contain various amounts of a commodity that must be shipped to n 
destinations to meet demand requirements. Specifically, origin i contains an amount 
au and destination j has a requirement of amount bj. It is assumed that the system 
is balanced in the sense that total supply equals total demand. That is, 

m n 

= (3-34) 

i= 1 j= 1 

The numbers at and bj, i = 1,2, . . . , m; j = 1,2, . . . , n, are assumed to be non- 
negative, and in many applications they are in fact nonnegative integers. There is a 
unit cost Cij associated with the shipping of the commodity from origin i to destina- 
tion j. The problem is to find the shipping pattern between origins and destinations 
that satisfies all the requirements and minimizes the total shipping cost. 

In mathematical terms the above problem can be expressed as finding a set of jq/ 
s, i - 1,2, . . . , m; j = 1,2, . . . , n, to 

m n 

minimize 

1=1 7=1 
n 

subject to ^ Xij = at for i = 1, 2, . . . , m (3.35) 

j= i 

m 

Xij - bj for j = 1 , 2, . . . , n 

i= 1 

x^ > 0 for all i and j. 

This mathematical problem, together with the assumption (3.34), is the general 
transportation problem. In the shipping context, the variables x t j represent the 
amounts of the commodity shipped from origin i to destination j. 

The structure of the problem can be seen more clearly by writing the constraint 
equations in standard form: 
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*11 + *12 + ’ * * + *1 n 



= a i 



*21 + *22 + • ‘ ‘ + * 2 n 



= a 2 



*ml + 2C m 2 + ' * * + 

*11 +*21 *ml = &1 

*12 + *22 + * m 2 = i >2 



*1 n 



+ *2n 



: (3.36) 

+ *mn — b n 



The structure is perhaps even more evident when the coefficient matrix A of the 
system of equations above is expressed in vector-matrix notation as 



1 T 



1 T 



A = 



I I 




(3.37) 



where 1 = (l,l,...,l)is ^-dimensional, and where each I is an nxn identity matrix. 

In practice it is usually unnecessary to write out the constraint equations of the 
transportation problem in the explicit form (3.36). A specific transportation problem 
is generally defined by simply presenting the data in compact form, such as: 



a = (ai, a 2 , . . a m ) 



b = (b u b 2 ,..., b n ) 



C = 



C 11 C 12 • • • Ci n 
C2\ C22 • • • C 2n • 



Cm\ Cm2 ' ' ' C mn 



The solution can also be represented by an m x n array, and as we shall see, all 
computations can be made on arrays of a similar dimension. 



Example 1. As an example, which will be solved completely in a later section, a 
specific transportation problem with four origins and five destinations is defined by 



a = (30, 80, 10, 60) 



b = (10, 50, 20, 80, 20) 



C = 



3 4 6 8 9 
2 24 5 5 

2 2 2 3 2 

3 3 2 4 2 



Note that the balance requirement is satisfied, since the sum of the supply and the 
demand are both 180. 
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Finding a Basic Feasible Solution 



A first step in the study of the structure of the transportation problem is to show 
that there is always a feasible solution, thus establishing that the problem is well 
defined. A feasible solution can be found by allocating shipments from origins to 
destinations in proportion to supply and demand requirements. Specifically, let S 
be equal to the total supply (which is also equal to the total demand). Then let 
Xij = dibj/S for i = 1,2, . . . , m; j = 1,2, . . . , n. The reader can easily verify that 
this is a feasible solution. We also note that the solutions are bounded, since each 
Xij is bounded by a t (and by bj). A bounded program with a feasible solution has an 
optimal solution. Thus, a transportation problem always has an optimal solution. 

A second step in the study of the structure of the transportation problem is based 
on a simple examination of the constraint equations. Clearly there are m equations 
corresponding to origin constraints and n equations corresponding to destination 
constraints — a total of n + m. However, it is easily noted that the sum of the origin 
equations is 

m n m 

Z Z = Z a " 

i= 1 j= 1 i= 1 

and the sum of the destination equations is 

n m n 

XX x U = t b r 

7=1 i=l 7=1 

The left-hand sides of these equations are equal. Since they were formed by two dis- 
tinct linear combinations of the original equations, it follows that the equations in the 
original system are not independent. The right-hand sides of (3.38) and (3.39) are 
equal by the assumption that the system is balanced, and therefore the two equations 
are, in fact, consistent. However, it is clear that the original system of equations is 
redundant. This means that one of the constraints can be eliminated without chang- 
ing the set of feasible solutions. Indeed, any one of the constraints can be chosen 
as the one to be eliminated, for it can be reconstructed from those remaining. It fol- 
lows that a basis for the transportation problem consists of m + n - 1 vectors, and 
a nondegenerate basic feasible solution consists of m + n - 1 variables. The simple 
solution found earlier in this section is clearly not a basic solution. 

There is a straightforward way to compute an initial basic feasible solution to 
a transportation problem. The method is worth studying at this stage because it 
introduces the computational process that is the foundation for the general solution 
technique based on the simplex method. It also begins to illustrate the fundamental 
property of the structure of transportation problems. 



(3.38) 



(3.39) 
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The Northwest Corner Rule 

This procedure is conducted on the solution array shown below: 



*11 


*12 


*13 




*1 n 


a\ 


*21 


*22 


*23 




*2 n 


a2 














*ml 


* m 2 


* m 3 




X/nn 


a m 


b i 


hi 


* 3 




bn 





(3.40) 



The individual elements of the array appear in cells and represent a solution. An 
empty cell denotes a value of zero. 

Beginning with all empty cells, the procedure is given by the following steps: 
Step 1 . Start with the cell in the upper left-hand corner. 

Step 2. Allocate the maximum feasible amount consistent with row and column 
sum requirements involving that cell. (At least one of these requirements will 
then be met.) 

Step 3. Move one cell to the right if there is any remaining row requirement (sup- 
ply). Otherwise move one cell down. If all requirements are met, stop; otherwise 
go to Step 2. 

The procedure is called the Northwest Corner Rule because at each step it selects 
the cell in the upper left-hand corner of the subarray consisting of current nonzero 
row and column requirements. 



Example 1. A basic feasible solution constructed by the Northwest corner Rule is 
shown below for Example 1 of the last section. 



10 


20 








30 




30 


20 


30 




80 








10 




10 








40 


20 


60 


10 


50 


20 


80 


20 





In the first step, at the upper left-hand corner, a maximum of 10 units could be 
allocated, since that is all that was required by column 1. This left 30-10 = 20 
units required in the first row. Next, moving to the second cell in the top row, the 
remaining 20 units were allocated. At this point the row 1 requirement is met, and 
it is necessary to move down to the second row. The reader should be able to follow 
the remaining steps easily. 

There is the possibility that at some point both the row and column requirements 
corresponding to a cell may be met. The next entry will then be a zero, indicating a 
degenerate basic solution. In such a case there is a choice as to where to place the 
zero. One can either move right or move down to enter the zero. Two examples of 
degenerate solutions to a problem are shown below: 
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30 








30 


30 








30 


20 


20 






40 


20 


20 


0 




40 




0 


20 




20 






20 




20 






20 


40 


60 






20 


40 


60 


50 


20 


40 


40 




50 


20 


40 


40 





It should be clear that the Northwest Corner Rule can be used to obtain different 
basic feasible solutions by first permuting the rows and columns of the array before 
the procedure is applied. Or equivalently, one can do this indirectly by starting the 
procedure at an arbitrary cell and then considering successive rows and columns in 
an arbitrary order. 



Basis Triangularity 

We now establish the most important structural property of the transportation prob- 
lem: the triangularity of all bases. This property simplifies the process of solution 
of a system of equations whose coefficient matrix corresponds to a basis, and thus 
leads to efficient implementation of the simplex method. 

The concept of upper and lower triangular matrices was introduced in connection 
with Gaussian elimination methods, see Appendix C. It is useful at this point to 
generalize slightly the notion of upper and lower triangularity. 

Definition. A nonsingular square matrix M is said to be triangular if by a permutation of 
its rows and columns it can be put in the form of a lower triangular matrix. 

There is a simple and useful procedure for determining whether a given matrix 
M is triangular: 

Step 1 . Find a row with exactly one nonzero entry. 

Step 2. Form a submatrix of the matrix used in Step 1 by crossing out the row 
found in Step 1 and the column corresponding to the nonzero entry in that row. 
Return to Step 1 with this submatrix. 

If this procedure can be continued until all rows have been eliminated, then the 
matrix is triangular. It can be put in lower triangular form explicitly by arranging 
the rows and columns in the order that was determined by the procedure. 

Example 1. Shown below on the left is a matrix before the above procedure is ap- 
plied to it. Indicated along the edges of this matrix is the order in which the rows 
and columns are indexed according to the procedure. Shown at the right is the same 
matrix when its rows and columns are permuted according to the order found. 
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1 


2 


0 


1 


0 


2" 


3 


4 


0 


0 


0 


0 


0 


4 


1 


0 


5 


0 


0 


6 


1 


2 


0 


0 


0 


0 


0 


0 


0 


4 


0 


0 


2 


5 


1 


4 


0 


0 


0 


2 


1 


7 


2 


1 


3 


1 


1 


2 


1 


2 


0 


0 


2 


3 


2 


0 


0 


3 


5 


0 


3 


2 


3 


2 


0 


_0 


2 


0 


1 


0 


0 _ 


4 


_2 


1 


2 


3 


7 


1 _ 
















4 


2 


1 


6 


3 


5 



Triangularization 



We are now prepared to derive the most important structural property of the trans- 
portation problem. 

Basis Triangularity Theorem. Every basis of the transportation problem is triangular. 

Proof. Refer to the system of constraints (3.36). Let us change the sign of the top 
half of the system; then the coefficient matrix of the system consists of entries that 
are either +1, -1, or 0. Following the result of the theorem in Sect. 3.7, delete any 
one of the equations to eliminate the redundancy. From the resulting coefficient 
matrix, form a basis B by selecting a nonsingular subset of m + n - 1 columns. 

Each column of B contains at most two nonzero entries, a + 1 and a - 1. Thus 
there are at most 2 (m + n - 1) nonzero entries in the basis. However, if every column 
contained two nonzero entries, then the sum of all rows would be zero, contradict- 
ing the nonsingularity of B. Thus at least one column of B must contain only one 
nonzero entry. This means that the total number of nonzero entries in B is less than 
2 (m + n - 1). It then follows that there must be a row with only one nonzero entry; 
for if every row had two or more nonzero entries, the total number would be at least 
2 (m + n - 1). This means that the first step of the procedure for verifying triangular- 
ity is satisfied. A similar argument can be applied to the submatrix of B obtained by 
crossing out the row with the single nonzero entry and the column corresponding to 
that entry; that submatrix must also contain a row with a single nonzero entry. This 
argument can be continued, establishing that the basis B is triangular. I 

Example 2. As an illustration of the Basis Triangularity Theorem, consider the ba- 
sis selected by the Northwest Comer Rule in Example 1 . This basis is represented 
below, except that only the basic variables are indicated, not their values. 



■Mi 


*12 








30 




*22 


*23 


*24 




80 








*34 




10 








*44 


*45 


60 


10 


50 


20 


80 


20 





A row in a basis matrix corresponds to an equation in the original system and is 
associated with a constraint either on a row or column sum in the solution array. In 
this example the equation corresponding to the first column sum contains only one 
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basis variable, x\\. The value of this variable can be found immediately to be 10. 
The next equation corresponds to the first row sum. The corresponding variable is 
xi 2 , which can be found to be 20, since in is known. Progression in this manner 
through the basis variables is equivalent to back substitution. 

The importance of triangularity is, of course, the associated method of back 
substitution for the solution of a triangular system of equations, as discussed in 
Appendix C. Moreover, since any basis matrix is triangular and all nonzero ele- 
ments are equal to one (or minus one if the signs of some equations are changed), it 
follows that the process of back substitution will simply involve repeated additions 
and subtractions of the given row and column sums. No multiplication is required. It 
therefore follows that if the original row and column totals are integers, the values of 
all basic variables will be integers. This is an important result, which we summarize 
by a corollary to the Basis Triangularity Theorem. 

Corollary. If the row and column sums of a transportation problem are integers, then the 
basic variables in any basic solution are integers. 



The Transportation Simplex Method 

Now that the structural properties of the transportation problem have been devel- 
oped, it is a relatively straightforward task to work out the details of the simplex 
method for the transportation problem. A major objective is to exploit fully the tri- 
angularity property of bases in order to achieve both computational efficiency and 
a compact representation of the method. The method used is actually a direct adap- 
tation of the version of the revised simplex method presented in the first part of 
Sect. 3.6. The basis is never inverted; instead, its triangular form is used directly to 
solve for all required variables. 



Simplex Multipliers 

Simplex multipliers are associated with the constraint equations. In this case we 
partition the vector of multipliers as y = (u, v). Here, Ui represents the multiplier 
associated with the ith row sum constraint, and Vj represents the multiplier associ- 
ated with the 7 'th column sum constraint. Since one of the constraints is redundant, 
an arbitrary value may be assigned to any one of the multipliers (see Exercise 4, 
Chap. 4). For notational simplicity we shall at this point set v n = 0. 

Given a basis B, the simplex multipliers are found to be the solution to the 
equation y r B = Cg. To determine the explicit form of these equations, we again 
refer to the original system of constraints (3.36). If Xjj is basic, then the correspond- 
ing column from A will be included in B. This column has exactly two +1 entries: 
one in the ith position of the top portion and one in the jth position of the bottom 
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portion. This column thus generates the simplex multiplier equation Ui + Vj = C[j , 
since w* and v 7 are the corresponding components of the multiplier vector. Overall, 
the simplex multiplier equations are 

^ + Vj = c tj , (3.42) 

for all /, j for which Xij is basic. The coefficient matrix of this system is the transpose 
of the basis matrix and hence it is triangular. Thus, this system can be solved by back 
substitution. This is similar to the procedure for finding the values of basic variables 
and, accordingly, as another corollary of the Triangular Basis Theorem, an integer 
property holds for simplex multipliers. 

Corollary. If the unit costs Cij of a transportation problem are all integers, then (assuming 
one simplex multiplier is set arbitrarily equal to an integer ) the simplex multipliers associ- 
ated with any basis are integers. 

Once the simplex multipliers are known, the relative cost coefficients for nonba- 
sic variables can be found in the usual manner as = c d ~~ y r D. I n this case the 
relative cost coefficients are 

Vij = Cij - Ui - Vj for i = 1, 2, . . . , m 

j = 1,2,..., n. (3.43) 

This relation is valid for basic variables as well if we define relative cost coefficients 
for them — having value zero. 

Given a basis, computation of the simplex multipliers is quite similar to the cal- 
culation of the values of the basic variables. The calculation is easily carried out 
on an array of the form shown below, where the circled elements correspond to the 
positions of the basic variables in the current basis. 



C 11 


© 


C 13 


•* c ln 


u x 


C 2 \ 


© 


C 23 


^2 n 


u 2 


C m\ 






<Q) 


Um 


”1 


u 2 




•• "n 





In this case the main part of the array, with the coefficients c; ; , remains fixed, and 
we calculate the extra column and row corresponding to u and v. 

The procedure for calculating the simplex multipliers is this: 

Step 1. Assign an arbitrary value to any one of the multipliers. 

Step 2. Scan the rows and columns of the array until a circled element is found 
such that either Ui or Vj (but not both) has already been determined. 

Step 3. Compute the undetermined u t or Vj from the equation c t j = u t + v ; . If all 
multipliers are determined, stop. Otherwise, return to Step 2. 

The triangularity of the basis guarantees that this procedure can be carried 
through to determine all the simplex multipliers. 
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Example 1. Consider the cost array of Example 1 of Sect. 5.1, which is shown below 
with the circled elements corresponding to a basic feasible solution (found by the 
Northwest Corner Rule). Only these numbers are used in the calculation of the 
multipliers. 

r®@ 6 8 9 

2 @® (D 5 

2 2 2 (D 2 ' 

3 3 2 ®(D 

We first arbitrarily set V5 = 0. We then scan the cells, searching for a circled element 
for which only one multiplier must be determined. This is the bottom right corner 
element, and it gives 1/4 = 2. Then, from the equation 4 = 2 + V4, V4 is found to be 2. 
Next, 1/3 and U 2 are determined, then V3 and V2, and finally u\ and vi. The result is 
shown below: 
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Cycle of Change 

In accordance with the general simplex procedure, if a nonbasic variable has an 
associated relative cost coefficient that is negative, then that variable is a candidate 
for entry into the basis. As the value of this variable is gradually increased, the 
values of the current basic variables will change continuously in order to maintain 
feasibility. Then, as usual, the value of the new variable is increased precisely to the 
point where one of the old basic variables is driven to zero. 

We must work out the details of how the values of the current basic variables 
change as a new variable is entered. If the new basic vector is d, then the change 
in the other variables is given by -B _ 1 d, where B is the current basis. Hence, once 
again we are faced with a problem of solving a system associated with the triangular 
basis, and once again the solution has special properties. In the next theorem recall 
that A is defined by (3.37). 

Theorem. Let B be a basis from A ( ignoring one row), and let d be another column. Then 
the components of the vector w = B -1 d are either 0, +1, or -1. 

Proof. Let w be the solution to the equation Bw = d. Then w is the representation 
of d in terms of the basis. This equation can be solved by Cramer’s rule as 

detB^ 
detB ’ 



w k = 
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where is the matrix obtained by replacing the kth column of B by d. Both B and 
B^ are submatrices of the original constraint matrix A. The matrix B may be put 
in triangular form with all diagonal elements equal to +1. Hence, accounting for 
the sign change that may result from the combined row and column interchanges, 
detB = +1 or -1. Likewise, it can be shown (see Exercise 3) that detB^ = 0, +1, 
or -1. We conclude that each component of w is either 0, +1, or -1. 1 

The implication of the above result is that when a new variable is added to the 
solution at a unit level, the current basic variables will each change by +1, -1, or 0. 
If the new variable has a value 6 , then, correspondingly, the basic variables change 
by +6, - 6 , or 0. It is therefore only necessary to determine the signs of change for 
each basic variable. 

The determination of these signs is again accomplished by row and column scan- 
ning. Operationally, one assigns a + to the cell of the entering variable to represent 
a change of +0, where 6 is yet to be determined. Then +’s, -’s, and 0’s are assigned, 
one by one, to the cells of some basic variables, indicating changes of +0, -0, or 
0 to maintain a solution. As usual, after each step there will always be an equation 
that uniquely determines the sign to be assigned to another basic variable. The result 
will be a sequence of pluses and minuses assigned to cells that form a cycle leading 
from the cell of the entering variable back to that cell. In essence, the new change is 
part of a cycle of redistribution of the commodity flow in the transportation system. 

Once the sequence of +’s, -’s, and 0’s is determined, the new basic feasible 
solution is found by setting the level of the change 6. This is set so as to drive one 
of the old basic variables to zero. One must simply examine those basic variables 
for which a minus sign has been assigned, for these are the ones that will decrease 
as the new variable is introduced. Then 6 is set equal to the smallest magnitude of 
these variables. This value is added to all cells that have a + assigned to them and 
subtracted from all cells that have a - assigned. The result will be the new basic 
feasible solution. 

The procedure is illustrated by the following example. 

Example 2. A completed solution array is shown below: 
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In this example *53 is the entering variable, so a plus sign is assigned there. The 
signs of the other cells were determined in the order X13, *23, *25, *35, *32, *31, *41, 
*51, *54. The smallest variable with a minus assigned to it is *51 = 10. Thus we set 
6 = 10 . 
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The Transportation Simplex Algorithm 

It is now possible to put together the components developed to this point in the form 
of a complete revised simplex procedure for the transportation problem. The steps 
are: 

Step 1. Compute an initial basic feasible solution using the Northwest Corner 
Rule or some other method. 

Step 2. Compute the simplex multipliers and the relative cost coefficients. If all 
relative cost coefficients are nonnegative, stop; the solution is optimal. Otherwise, 
go to Step 3. 

Step 3. Select a nonbasic variable corresponding to a negative cost coefficient to 
enter the basis (usually the one corresponding to the most negative cost coeffi- 
cient). Compute the cycle of change and set 6 equal to the smallest basic variable 
with a minus assigned to it. Update the solution. Go to Step 2. 

Example 3. We can now completely solve the problem that was introduced in Exam- 
ple 1 of the first section. The requirements and a first basic feasible solution obtained 
by the Northwest Corner Rule are shown below. The plus and minus signs indicated 
on the array should be ignored at this point, since they cannot be computed until the 
next step is completed. 
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The cost coefficients of the problem are shown in the array below, with the circled 
cells corresponding to the current basic variables. The simplex multipliers, com- 
puted by row and column scanning, are shown as well. 
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The relative cost coefficients are found by subtracting uj + Vj from Qy. In this case 
the only negative result is in cell 4,3; so variable X 43 will be brought into the basis. 
Thus a + is entered into this cell in the original array, and the cycle of zeros and plus 
and minus signs is determined as shown in that array. (It is not necessary to continue 
scanning once a complete cycle is determined.) 

The smallest basic variable with a minus sign is 20 and, accordingly, 20 is added 
or subtracted from elements of the cycle as indicated by the signs. This leads to the 
new basic feasible solution shown in the array below: 
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The new simplex multipliers corresponding to the new basis are computed, and 
the cost array is revised as shown below. In this case all relative cost coefficients are 
positive, indicating that the current solution is optimal. 
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Degeneracy 

As in all linear programming problems, degeneracy, corresponding to a basic vari- 
able having the value zero, can occur in the transportation problem. If degeneracy 
is encountered in the simplex procedure, it can be handled quite easily by introduc- 
tion of the standard perturbation method (see Exercise 15 , Chap. 3 ). In this method 
a zero-valued basic variable is assigned the value s and is then treated in the usual 
way. If it later leaves the basis, then the s can be dropped. 

Example 4 . To illustrate the method of dealing with degeneracy, consider a modifi- 
cation of Example 3 , with the fourth row sum changed from 60 to 20 and the fourth 
column sum changed from 80 to 40 . Then the initial basic feasible solution found 
by the Northwest Corner Rule is degenerate. An s is placed in the array for the 
zero- valued basic variable as shown below: 
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The relative cost coefficients will be the same as in Example 3 , and hence again 
X43 should be chosen to enter, and the cycle of change is the same as before. In 
this case, however, the change is only s, and variable X44 leaves the basis. The new 
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relative cost coefficients are all positive, indicating that the new solution is optimal. 
Now the s can be dropped to yield the final solution (which is, itself, degenerate in 
this case). 
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*3.8 Decomposition 



Large linear programming problems usually have some special structural form that 
can (and should) be exploited to develop efficient computational procedures. One 
common structure is where there are a number of separate activity areas that are 
linked through common resource constraints. An example is provided by a multidi- 
visional firm attempting to minimize the total cost of its operations. The divisions 
of the firm must each meet internal requirements that do not interact with the con- 
straints of other divisions; but in addition there are common resources that must be 
shared among divisions and thereby represent linking constraints. 

A problem of this form can be solved by the Dantzig-Wolfe decomposition 
method described in this section. The method is an iterative process where at each 
step a number of separate subproblems are solved. The subproblems are themselves 
linear programs within the separate areas (or within divisions in the example of 
the firm). The objective functions of these subproblems are varied from iteration to 
iteration and are determined by a separate calculation based on the results of the 
previous iteration. This action coordinates the individual subproblems so that, ulti- 
mately, the solution to the overall problem is solved. The method can be derived as 
a special version of the revised simplex method, where the subproblems correspond 
to evaluation of reduced cost coefficients for the main problem. 

To describe the method we consider the linear program in standard form 



minimize c r x 
subject to Ax = b, x > 0. 



( 3 . 44 ) 



Suppose, for purposes of this entire section, that the A matrix has the special “block- 
angular” structure: 




( 3 . 45 ) 
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By partitioning the vectors x, c T , and b consistent with this partition of A, the 
problem can be rewritten as 



N 

minimize ^ cf X; 

i= 1 
N 

subject to ^L;X; = bo (3.46) 

i= 1 

A,x, = b; 

X/ ^ 0, i = 1, . . . , A. 

This may be viewed as a problem of minimizing the total cost of N different linear 
programs that are independent except for the first constraint, which is a linking 
constraint of, say, dimension m. 

Each of the subproblems is of the form 

minimize cjx t 

subject to A iXi = b t , x t >0. 

The constraint set for the ith subproblem is Si = {x, : A;X; = b /, x t ^ 0}. As 
for any linear program, this constraint set Si is a polytope and can be expressed 
as the intersection of a finite number of closed half-spaces. There is no guarantee 
that each Si is bounded, even if the original linear program (3.44) has a bounded 
constraint set. We shall assume for simplicity, however, that each of the polytopes 
Si, i = 1, . . . , A is indeed bounded and hence is a polyhedron. One may guarantee 
that this assumption is satisfied by placing artificial (large) upper bounds on each x t . 

Under the boundedness assumption, each polyhedron S i consists entirely of 
points that are convex combinations of its extreme points. Thus, if the extreme points 
of Si are {xn , x&, . . . , x iKi ), then any point x t e Si can be expressed in the form 

Ki 

X; = X aijXij, 

7=1 

Ki (3 47) 

where 2 a tj = 1 K } 

7=1 

and aij > 0, 7 = 1 ,..., Ki. 

The aij ’s are the weighting coefficients of the extreme points. 

We now convert the original linear program to an equivalent master problem , 
of which the objective is to find the optimal weighting coefficients for each poly- 
hedron, Si. Corresponding to each extreme point Xy in Si, define pij = c Jx t j and 
q y = L iXy. Clearly p t j is the equivalent cost of the extreme point Xy, and is its 
equivalent activity vector in the linking constraints. 
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Then the original linear program (3.44) is equivalent, using (3.47), to the master 
problem: 



N Ki 

minimize EE Pij a ij 
i= 1 7=1 
N Ki 

subject to ^ ^ qyay = b 0 
i=i j= i 

Ki 

E au = i 

7=1 

ay>0, 7 = 1, 



(3.48) 



• /= 1, A7. 






This master problem has variables 

Of 7 = (Ofn, . . . , 0^21 , • • • , tt2K 2 i • ■ • > a N \ , • • • , (%NK n ^ 



and can be expressed more compactly as 

minimize p r a 

subject to Qor = g, a > 0 (3.49) 

where g r = [bj , 1, 1, . . . , 1]; the element of p associated with a t j is pip and the 
column of Q associated with cty is 



with e* denoting the zth unit vector in E^. 

Suppose that at some stage of the revised simplex method for the master prob- 
lem we know the basis B and corresponding simplex multipliers y T = Pb b The 
corresponding relative cost vector is = Cp - y r D, having components 



r y = Pij - y T 




(3.50) 



It is not necessary to calculate all the r^’s; it is only necessary to determine the 
minimal r^. If the minimal value is nonnegative, the current solution is optimal and 
the process terminates. If, on the other hand, the minimal element is negative, the 
corresponding column should enter the basis. 

The search for the minimal element in (3.50) is normally made with respect 
to nonbasic columns only. The search can be formally extended to include basic 
columns as well, however, since for basic elements 



T 

Pij- y 






e* 



= 0 . 
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The extra zero values do not influence the subsequent procedure, since a new column 
will enter only if the minimal value is less than zero. 

We therefore define r* as the minimum relative cost coefficient for all possible 
basis vectors. That is, 

r* = minimum < r* = minimum! pa - \ T 

!€(i,...,iv] \ ' } l/ y 

Using the definitions of pij and q,y, this becomes 

r* = minimum {cf Xy - y^L jXy - y m+ ,} , (3.5 1) 

where yo is the vector made up of the first m elements of y, m being the number of 
rows of L j [the number of linking constraints in (3.47)]. 

The minimization problem in (3.51) is actually solved by the ith subproblem : 

minimize (cf - yf L/)x/ 

subject to A jXj = b j, xj > 0 (3.52) 

This follows from the fact that y m+ i is independent of the extreme point index j 
(since y is fixed during the determination of the r/s), and that the solution of (3.52) 
must be that extreme point of Si, say x&, of minimum cost, using the adjusted cost 
coefficients cf - yfLy. 

Thus, an algorithm for this special version of the revised simplex method applied 
to the master problem is the following: Given a basis B 

Step 1. Calculate the current basic solution x B , and solve y r B = Cg for y. 

Step 2. For each i = 1,2, . . . , N, determine the optimal solution x* of the ith 

subproblem (3.52) and calculate 

r*=(cf-y£L;)x*-y m+ r (3.53) 

If all r* > 0, stop; the current solution is optimal. 

Step 3. Determine which column is to enter the basis by selecting the minimal r*. 
Step 4. Update the basis of the master problem as usual. 

This algorithm has an interesting economic interpretation in the context of a 
multidivisional firm minimizing its total cost of operations as described earlier. 
Division V s activities are internally constrained by Ax, = b j, and the common res- 
ources bo impose linking constraints. At Step 1 of the algorithm, the firm’s central 
management formulates its current master plan, which is perhaps suboptimal, and 
announces a new set of prices that each division must use to revise its recommended 
strategy at Step 2. In particular, -y 0 reflects the new prices that higher management 
has placed on the common resources. The division that reports the greatest rate of 
potential cost improvement has its recommendations incorporated in the new mas- 
ter plan at Step 3, and the process is repeated. If no cost improvement is possible, 
central management settles on the current master plan. 
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3.9 Summary 



The simplex method is founded on the fact that the optimal value of a linear pro- 
gram, if finite, is always attained at a basic feasible solution. Using this foundation 
there are two ways in which to visualize the simplex process. The first is to view the 
process as one of continuous change. One starts with a basic feasible solution and 
imagines that some nonbasic variable is increased slowly from zero. As the value of 
this variable is increased, the values of the current basic variables are continuously 
adjusted so that the overall vector continues to satisfy the system of linear equality 
constraints. The change in the objective function due to a unit change in this non- 
basic variable, taking into account the corresponding required changes in the values 
of the basic variables, is the relative cost coefficient associated with the nonbasic 
variable. If this coefficient is negative, then the objective value will be continuously 
improved as the value of this nonbasic variable is increased, and therefore one inc- 
reases the variable as far as possible, to the point where further increase would 
violate feasibility. At this point the value of one of the basic variables is zero, and 
that variable is declared nonbasic, while the nonbasic variable that was increased is 
declared basic. 

The other viewpoint is more discrete in nature. Realizing that only basic feasible 
solutions need be considered, various bases are selected and the corresponding basic 
solutions are calculated by solving the associated set of linear equations. The logic 
for the systematic selection of new bases again involves the relative cost coefficients 
and, of course, is derived largely from the first, continuous, viewpoint. 

Problems of special structure are important both for applications and for theory. 
The transportation problem represents an important class of linear programs with 
structural properties that lead to an efficient implementation of the simplex method. 
The most important property of the transportation problem is that any basis is trian- 
gular. This means that the basic variables can be found, one by one, directly by back 
substitution, and the basis need never be inverted. Likewise, the simplex multipli- 
ers can be found by back substitution, since they solve a set of equations involving 
the transpose of the basis. Moreover, when any basis matrix is triangular and all 
nonzero elements are equal to one (or minus one if the signs of some equations 
are changed), it follows that the process of back substitution will simply involve 
repeated additions and subtractions of the given row and column sums. No multi- 
plication or division is required. It therefore follows that if the original right-hand 
side are integers, the values of all basic variables will be integers. Hence, an opti- 
mal basic solution, where each entry is integral, always exists; that is, there is no 
gap between continuous linear program and integer linear program (or the integral- 
ity gap is zero). The transportation problem can be generalized to a minimum cost 
flow problem in a network. This leads to the interpretation of a simplex basis as 
corresponding to a spanning tree in the network; see Appendix D. 

Many linear programming methods have implemented a P re solver procedure to 
eliminate redundant or duplicate constraints and/or value fixed variables, and to 
check possible constraint inconsistency and unboundedness. This typically results 
in problem size reduction and possible infeasibility detection. 
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3.10 Exercises 



1. Using pivoting, solve the simultaneous equations 

3x\ + 2x2 = 5 
5xi + X 2 = 9. 

2. Using pivoting, solve the simultaneous equations 

X\ + 2x2 + X 3 = 7 
2xi - X2 + 2x3 = 6 
X\ + X2 + 3X3 = 12. 



3. Solve the equations in Exercise 2 by Gaussian elimination as described in 
Appendix C. 

4. Suppose B is an m x m square nonsingular matrix, and let the tableau T be 
constructed, T = [I, B] where I is the mxm identity matrix. Suppose that pivot 
operations are performed on this tableau so that it takes the form [C, I]. Show 
that C = B -1 . 

5. Show that if the vectors ai, a 2 , . . ., a m are a basis in E m , the vectors ai, 

a 2 ? • ■ • ? &p- 1? 

a q , SLp+i, . . . , a m also are a basis if and only if a pq ± 0, where a pq is defined by 
the tableau (3.5). 

6. If rj > 0 for every j corresponding to a variable xj that is not basic, show that 
the corresponding basic feasible solution is the unique optimal solution. 

7. Show that a degenerate basic feasible solution may be optimal without satisfy- 
ing rj > 0 for all j. 

8 . 

(a) Using the simplex procedure, solve 

maximize —x\ + X2 
subject to x\ - X2 < 2 

X\ + X2 < 6 
X\ >0, X2 > 0. 

(b) Draw a graphical representation of the problem in x\ , X 2 space and indicate the 
path of the simplex steps. 

(c) Repeat for the problem 



maximize x\ + X 2 
subject to -2xi + X 2 < 1 
xi - x 2 < 1 

X\ ^0, X2 > 0. 
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9. Using the simplex procedure, solve the spare-parts manufacturer’s problem 
(Exercise 4, Chap. 2). 

10. Using the simplex procedure, solve 

minimize 2xi + 4^2 + X3 + X 4 
subject to x\ + 3x2 + X4 < 4 

2xi + X2 <3 

X2 + 4x3 + X4 ^ 3 
x\ ^ 0 i = 1 , 2 , 3 , 4 . 

1 1 . For the linear program of Exercise 10 

(a) How much can the elements of b = (4, 3, 3) be changed without changing the 
optimal basis? 

(b) How much can the elements of c = (2, 4, 1, 1) be changed without changing the 
optimal basis? 

(c) What happens to the optimal cost for small changes in b? 

(d) What happens to the optimal cost for small changes in c? 

12. Consider the problem 

minimize x\ - 3x2 - 0.4x3 
subject to 3 xi - X2 + 2x3 < 7 
— 2xi + 4x2 ^ 12 

-4xi + 3x2 + 3x3 < 14 
x\ ^ 0, X2 > 0, X3 ^ 0. 

(a) Find an optimal solution. 

(b) How many optimal basic feasible solutions are there? 

(c) Show that if C4 + \au + ^24 ^ 0, then another activity X4 can be introduced 
with cost coefficient c\ and activity vector (a\ 4 , ^24 , ^34) without changing the 
optimal solution. 

13. Rather than select the variable corresponding to the most negative relative cost 
coefficient as the variable to enter the basis, it has been suggested that a better 
criterion would be to select that variable which, when pivoted in, will pro- 
duce the greatest improvement in the objective function. Show that this crite- 
rion leads to selecting the variable Xk corresponding to the index k minimizing 
max r k aio/a ik . 

i,a.ik> 0 

14. In the ordinary simplex method one new vector is brought into the basis and 
one removed at every step. Consider the possibility of bringing two new vectors 
into the basis and removing two at each stage. Develop a complete procedure 
that operates in this fashion. 

15. Degeneracy. If a basic feasible solution is degenerate, it is then theoretically 
possible that a sequence of degenerate basic feasible solutions will be generated 
that endlessly cycles without making progress. It is the purpose of this exercise 
and the next two to develop a technique that can be applied to the simplex 
method to avoid this cycling. 
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Corresponding to the linear system Ax = b where A = [ai, a 2 , . . . , a n ] define 
the perturbed system Ax = b (s) where b (e) = b + ssl\ + s 2 a 2 + • • • + £^a n , s > 
0. Show that if there is a basic feasible solution (possibly degenerate) to the 
unperturbed system with basis B = [ai, a 2 , . . . , a m ], then corresponding to 
the same basis, there is a nondegenerate basic feasible solution to the perturbed 
system for some range of s > 0 . 

16. Show that corresponding to any basic feasible solution to the perturbed system 
of Exercise 15, which is nondegenerate for some range of s > 0, and to a vector 
a k not in the basis, there is a unique vector a j in the basis which when replaced 
by ak leads to a basic feasible solution; and that solution is nondegenerate for a 
range of s > 0 . 

17. Show that the tableau associated with a basic feasible solution of the perturbed 
system of Exercise 15, and which is nondegenerate for a range of s > 0, is 
identical with that of the unperturbed system except in the column under b(£). 
Show how the proper pivot in a given column to preserve feasibility of the 
perturbed system can be determined from the tableau of the unperturbed system. 
Conclude that the simplex method will avoid cycling if whenever there is a 
choice in the pivot element of a column k, arising from a tie in the minimum of 
aio/aik among the elements i e /o, the tie is resolved by finding the minimum 
of an/aik , i 6 Io- If there still remainties among elements i e /, the process is 
repeated with aa/ciik, etc., until there is a unique element. 

18. Using the two-phase simplex procedure solve 

(a) 

minimize -3x\ + X 2 + 3^3 ~ M 
subject to x\ + 2 x 2 - X 3 + x\ — 0 
2xi - 2x2 + 3x3 + 3x4 = 9 
X\ — X2 + 2x3 - X4 = 6 

x\ >0, i = 1,2 , 3, 4. 

(b) 

minimize x\ + 6 x 2 - 7 x 3 + X 4 + 5xs 
subject to 5xi - 4 x 2 + 13x3 - 2 x 4 + X 5 = 20 

X\ — X2 + 5x3 - X4 + X5 = 8 

x\ >0, i = 1,2, 3.4, 5. 

19. Solve the oil refinery problem (Exercise 3, Chap. 2). 

20. Show that in the phase I procedure of a problem that has feasible solutions, if an 
artificial variable becomes nonbasic, it need never again be made basic. Thus, 
when an artificial variable becomes nonbasic its column can be eliminated from 
future tableaus. 

21. Suppose the phase I procedure is applied to the system Ax = b, x > 0, and that 
the resulting tableau (ignoring the cost row) has the form 
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This corresponds to having m-k basic artificial variables at zero level. 

(a) Show that any nonzero element in R2 can be used as a pivot to eliminate a basic 
artificial variable, thus yielding a similar tableau but with k increased by one. 

(b) Suppose that the process in (a) has been repeated to the point where R2 = 0 . 
Show that the original system is redundant, and show how phase II may proceed 
by eliminating the bottom rows. 

(c) Use the above method to solve the linear program 

minimize 2 x\ + 6 x 2 + X3 + X 4 

subject to x\ + 2 v 2 + M 

X\ + 2 X 2 + X3 + X4 
X\ + 3 X 2 — X3 + 2x4 
Xi + X2 + X3 

x\ > 0 , X 2 > 0 , X 3 > 0 , 

22. Find a basic feasible solution to 

x\ + 2x2 - X3 + X4 = 3 
2xi + 4 x 2 + V3 + 2x4 = 12 
Xi + 4x2 + 2 x 3 + X4 = 9 
xi > 0, i = 1,2, 3, 4. 

23. Consider the system of linear inequalities Ax > b, x > 0 with b > 0. This 
system can be transformed to standard form by the introduction of m surplus 
variables so that it becomes Ax-y = b, x > 0 , y > 0 . Let bk = max/ hi and 
consider the new system in standard form obtained by adding the kth row to the 
negative of every other row. Show that the new system requires the addition of 
only a single artificial variable to obtain an initial basic feasible solution. 

Use this technique to find a basic feasible solution to the system. 

xi + 2x2 + X 3 > 4 
2xi + X 2 + X 3 ^ 5 
2xi + 3x2 + 2x3 > 6 
Xj >0, i = 1, 2, 3. 



= 6 
= 7 
= 7 
= 5 

X4 > 0. 
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24. It is possible to combine the two phases of the two-phase method into a single 
procedure by the big-M method. Given the linear program in standard form 

minimize c T x 
subject to Ax = b, x > 0, 

one forms the approximating problem 



minimize c r x + M ^ Uj 

i= 1 

subject to Ax + u = b 
x > 0, u > 0. 

In this problem u = (u\, U2, . . . , u m ) is a vector of artificial variables and M is 

m 

a large constant. The term M ^ Uj serves as a penalty term for nonzero u\ s. 

i= 1 

If this problem is solved by the simplex method, show the following: 

(a) If an optimal solution is found with y = 0, then the corresponding x is an 
optimal basic feasible solution to the original problem. 

(b) If for every M > 0 an optimal solution is found with y ^ 0, then the original 
problem is infeasible. 

(c) If for every M > 0 the approximating problem is unbounded, then the original 
problem is either unbounded or infeasible. 

(d) Suppose now that the original problem has a finite optimal value V(oo). Let 
V(M) be the optimal value of the approximating problem. Show that 
V(Af) < V(oo). 

(e) Show that for M\ < M2 we have V{M\) ^ V{M2). 

(f) Show that there is a value Mo such that for M > Mo, V(M) = V(oo), and hence 
conclude that the big-M method will produce the right solution for large enough 
values of M. 

25. Using the revised simplex method find a basic feasible solution to 

x\ +2x2 ~ X3 + X4 = 3 
2x\ +4x2 + X3 + 2x4 — 12 
x\ + 4 x 2 + 2 x 3 + X 4 = 9 
x\ >0, i - 1,2, 3,4. 

26. The following tableau is an intermediate stage in the solution of a minimization 
problem: 

yi yi ys y* ys ye yo 
12/3 00 4/3 0 4 
0 -7/3 3 1 -2/3 0 2 

0 -2/3 -2 02/312 
r r 0 8/3 -11 0 4/3 0 -8 
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(a) Determine the next pivot element. 

(b) Given that the inverse of the current basis is 



B 1 = [ai, a 4 , a 6 ] 1 



1 

3 



1 1 -1 
1 -2 2 
-1 2 1 



and the corresponding cost coefficients are 

Cg = (ci, c 4 , c 6 ) = (-1, -3, 1), 
find the original problem. 

27. In many applications of linear programming it may be sufficient, for practical 
purposes, to obtain a solution for which the value of the objective function is 
within a predetermined tolerance s from the minimum value z* . Stopping the 
simplex algorithm at such a solution rather than searching for the true minimum 
may considerably reduce the computations. 

(a) Consider a linear programming problem for which the sum of the variables is 
known to be bounded above by s. Let zo denote the current value of the objective 
function at some stage of the simplex algorithm, (cj - zj) the corresponding 
relative cost coefficients, and 



M ~ max(z ; - - cj) j. 

Show that if M < s/s, then zo ~ z* < s. 

(b) Consider the transportation problem described in Sect. 2.2 (Example 3). Assum- 
ing this problem is solved by the simplex method and it is sufficient to obtain 
a solution within s tolerance from the optimal value of the objective function, 
specify a stopping criterion for the algorithm in terms of £ and the parameters 
of the problem. 

28. A matrix A is said to be totally unimodular if the determinant of every square 
submatrix formed from it has value 0, +1, or -1 

(a) Show that the matrix A defining the equality constraints of a transportation 
problem is totally unimodular. 

(b) In the system of equations Ax = b, assume that A is totally unimodular and 
that all elements of A and b are integers. Show that all basic solutions have 
integer components. 

29. For the arrays below: 

(a) Compute the basic solutions indicated. (Note: They may be infeasible.) 

(b) Write the equations for the basic variables, corresponding to the indicated 
basic solutions, in lower triangular form. 
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30. For the arrays of cost coefficients below, the circled positions indicate basic 
variables. 

(a) Compute the simplex multipliers. 

(b) Write the equations for the simplex multipliers in upper triangular form, 
and compare with Part(b) of Exercise 4. 



3 © ® 




© 6 ® 


2 © 3 




2 © 3 


© 5 © 




1 © © 



3 1 . Consider the modified transportation problem where there is more available at 

m n 

origins than is required at destinations (i.e., Yu a i> Z bj). 

i= 1 7= 1 



m n 



minimize ^ ^ c ij x ij 




7= i i= i 




n 

subject to ^ < a t , 

7=1 


i — 1, 2, . . . , m 


II 

=[X] 


j = 1,2,..., n 



i= 1 



Xij > 0, for all i, j. 

(a) Show how to convert it to an ordinary transportation problem. 

(b) Suppose there is a storage cost of s t per unit at origin i for goods not trans- 
ported to a destination. Repeat Part(a) with this assumption. 

32. Solve the following transportation problem, which is an original example of 
Hitchcock. 



(25 25 50) 


10 5 6 7 


/ \ c = 


8 27 6 


(15 20 30 35) 


9 3 4 8 



33. In a transportation problem, suppose that two rows or two columns of the cost 
coefficient array differ by a constant. Show that the problem can be reduced by 
combining those rows or columns. 

34. The transportation problem is often solved more quickly by carefully selecting 
the starting basic feasible solution. The matrix minimum technique for finding 
a starting solution is: (3.34) Find the lowest cost unallocated cell in the array, 
and allocate the maximum possible to it, (3.35) Reduce the corresponding row 
and column requirements, and drop the row or column having zero remaining 
requirement. Go back to Step 1 unless all remaining requirements are zero. 

(a) Show that this procedure yields a basic feasible solution. 

(b) Apply the method to Exercise 7. 
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35. The caterer problem. A caterer is booked to cater a banquet each evening for the 
next T days. He requires r t clean napkins on the ti h day for t = 1, 2, . . . , T. He 
may send dirty napkins to the laundry, which has two speeds of service — fast 
and slow. The napkins sent to the fast service will be ready for the next day’s 
banquet; those sent to the slow service will be ready for the banquet 2 days later. 
Fast and slow service cost c\ and c 2 per napkin, respectively, with c\ > The 
caterer may also purchase new napkins at any time at cost Co- He has an initial 
stock of s napkins and wishes to minimize the total cost of supplying fresh 
napkins. 

(a) Formulate the problem as a transportation problem. (Hint: Use T+ 1 sources 
and T destinations.) 

(b) Using the values T = 4, s = 200, n = 100, r 2 = 130, r 3 = 150, r 4 = 
140, c\ - 6 , C 2 - 4, Co = 12, solve the problem. 

36. The marriage assignment problem. A group of n men and n women live on an 
island. The amount of happiness that the zth man and the 7 th woman derive by 
spending a fraction xtj of their lives together is CijXij. What is the nature of the 
living arrangements that maximizes the total happiness of the islanders? 

37. Anticycling Rule. A remarkably simple procedure for avoiding cycling was 
developed by Bland, and we discuss it here. 

Bland’s Rule. In the simplex method : 

(a) Select the column to enter the basis by j = minjj : rj < 0}; that is, select the 
lowest indexed favorable column. 

(b) In case ties occur in the criterion for determining which column is to leave the 
basis , select the one with lowest index. 

We can prove by contradiction that the use of Bland’s rule prohibits cycling. 
Suppose that cycling occurs. During the cycle a finite number of columns enter 
and leave the basis. Each of these columns enters at level zero, and the cost 
function does not change. 

Delete all rows and columns that do not contain pivots during a cycle, obtaining 
a new linear program that also cycles. Assume that this reduced linear program 
has m rows and n columns. Consider the solution stage where column n is about 
to leave the basis, being replaced by column p. The corresponding tableau is as 
follows (where the entries shown are explained below): 

&p ■ ■ ■ b 

<0 0 0 

<0 0 0 

>0 10 

c T <0 0 0 

Without loss of generality, we assume that the current basis consists of the last 
m columns. In fact, we may define the reduced linear program in terms of this 
tableau, calling the current coefficient array A and the current relative cost vec- 
tor c. In this tableau we pivot on a mp , so a mp > 0. By Part(b) of Bland’s rule, 
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a n can leave the basis only if there are no ties in the ratio test, and since b = 0 
because all rows are in the cycle, it follows that Oi P < 0 for all i =£ m. 

Now consider the situation when column n is about to reenter the basis. Part(a) 
of Bland’s rule ensures that r n < 0 and r 7 > 0 for all i ± n. Apply the formula 
r t = ci - y T 2 Li to the last m columns to show that each component of y except y m 
is nonpositive; and y m > 0. Then use this to show that r p -c p - y r a p < c p < 0, 
contradicting r p ^ 0 . 

38. Use the Dantzig- Wolfe decomposition method to solve 

minimize -4xi - X 2 - 3^3 - 2 x 4 
subject to 2x\ + 2 x 2 + X 3 + 2 x 4 < 6 
X 2 + 2 x 3 + 3x4 ^ 4 
2 xi + X 2 <5 

x 2 <1 

- X3 + 2x4 < 2 

X3 + 2x4 < 6 

x \ ^ 0, X2 > 0, X3 > 0, X4 > 0. 
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3.8 For a more comprehensive description of the Dantzig and Wolfe [Dll] 
decomposition method, see Dantzig [D6] . 

The degeneracy technique discussed in Exercises 15-17 is due to Charnes 
[C2]. The anticycling method of Exercise 35 is due to Bland [B 19] . For the 
state of the art in Simplex solvers see Bixby [B 1 8] . 



Chapter 4 

Duality and Complementarity 



Associated with every linear program, and intimately related to it, is a corresponding 
dual linear program. Both programs are constructed from the same underlying cost 
and constraint coefficients but in such a way that if one of these problems is one of 
minimization the other is one of maximization, and the optimal values of the corre- 
sponding objective functions, if finite, are equal. The variables of the dual problem 
can be interpreted as prices associated with the constraints of the original (primal) 
problem, and through this association it is possible to give an economically mean- 
ingful characterization to the dual whenever there is such a characterization for the 
primal. 

The variables of the dual problem are also intimately related to the calculation of 
the relative cost coefficients in the simplex method. Thus, a study of duality sharp- 
ens our understanding of the simplex procedure and motivates certain alternative 
solution methods. Indeed, the simultaneous consideration of a problem from both 
the primal and dual viewpoints often provides significant computational advantage 
as well as economic insight. 



4.1 Dual Linear Programs 

In this section we define the dual program that is associated with a given linear pro- 
gram. Initially, we depart from our usual strategy of considering programs in stan- 
dard form, since the duality relationship is most symmetric for programs expressed 
solely in terms of inequalities. Specifically then, we define duality through the pair 
of programs displayed below. 
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Primal 

minimize c r x 
subject to Ax > b 
x ^ 0 



Dual 

maximize y r b 
subject to y r A < c T 

y >0 



(4.1) 



If A is an m x n matrix, then x is an m - dimensional column vector, b is an 
^-dimensional column vector, c T is an ^-dimensional row vector, and y T is an 
m - dimensional row vector. The vector x is the variable of the primal program, and y 
is the variable of the dual program. 

The pair of programs (4.1) is called the symmetric form of duality and, as ex- 
plained below, can be used to define the dual of any linear program. It is important 
to note that the role of primal and dual can be reversed. Thus, studying in detail 
the process by which the dual is obtained from the primal: interchange of cost and 
constraint vectors, transposition of coefficient matrix, reversal of constraint inequal- 
ities, and change of minimization to maximization; we see that this same process 
applied to the dual yields the primal. Put another way, if the dual is transformed, 
by multiplying the objective and the constraints by minus unity, so that it has the 
structure of the primal (but is still expressed in terms of y), its corresponding dual 
will be equivalent to the original primal. 

The dual of any linear program can be found by converting the program to the 
form of the primal shown above. For example, given a linear program in standard 
form 

minimize c r x 
subject to Ax = b, x ^ 0, 

we write it in the equivalent form 

minimize c T x 
subject to Ax ^ b 

-Ax > -b 
x > 0, 



which is in the form of the primal of (4.1) but with coefficient matrix 
a dual vector partitioned as (u, v), the corresponding dual is 



A 

-A 



. Using 



maximize u r b - v r b 
subject to u r A - v r A < c T 
u> 0, v > 0. 



Letting y = u - v we may simplify the representation of the dual program so that 
we obtain the pair of problems displayed below: 

Primal Dual 

minimize c r x maximize y r b (4.2) 

subject to Ax = b, x ^ 0 subject to y r A < c r . 
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This is the asymmetric form of the duality relation. In this form the dual vector y 
(which is really a composite of u and v) is not restricted to be nonnegative. 

Similar transformations can be worked out for any linear program to first get the 
primal in the form (4.1), calculate the dual, and then simplify the dual to account 
for special structure. 

In general, if some of the linear inequalities in the primal (4.1) are changed to 
equality, the corresponding components of y in the dual become free variables. 
If some of the components of x in the primal are free variables, then the corre- 
sponding inequalities in y r A < c T are changed to equality in the dual. We mention 
again that these are not arbitrary rules but are direct consequences of the original 
definition and the equivalence of various forms of linear programs. 

Example 1 ( Dual of the Diet Problem). The diet problem, Example 1, Sect. 2.2, was 
the problem faced by a dietitian trying to select a combination of foods to meet 
certain nutritional requirements at minimum cost. This problem has the form 

minimize c T x 
subject to Ax ^ b, x ^ 0 

and hence can be regarded as the primal program of the symmetric pair above. We 
describe an interpretation of the dual problem. 

Imagine a pharmaceutical company that produces in pill form each of the 
nutrients considered important by the dietitian. The pharmaceutical company tries 
to convince the dietitian to buy pills, and thereby supply the nutrients directly rather 
than through purchase of various foods. The problem faced by the drug company 
is that of determining positive unit prices A\, A 2 , . . . , A m for the nutrients so as to 
maximize revenue while at the same time being competitive with real food. To be 
competitive with real food, the cost of a unit of food i made synthetically from 
pure nutrients bought from the druggist must be no greater than a, the market price 
of the food. Thus, denoting by a* the ith food, the company must satisfy y r a* < c, 
for each i. In matrix form this is equivalent to y r A < c r . Since bj units of the jth 
nutrient will be purchased, the problem of the druggist is 

maximize y r b 

subject to y r A < c r , y > 0, 



which is the dual problem. 

Example 2 (Dual of the Transportation Problem). The transportation problem, 
Example 3, Sect. 2.2, is the problem, faced by a manufacturer, of selecting the 
pattern of product shipments between several fixed origins and destinations so as to 
minimize transportation cost while satisfying demand. Referring to (4.6) and (4.7) 
of Chap. 2, the problem is in standard form, and hence the asymmetric version of 
the duality relation applies. There is a dual variable for each constraint. In this case 
we denote the variables u^i- 1,2,..., m for (4.6) and y/, j = 1, 2, . . . , n for (4.7). 
Accordingly, the dual is 
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m 



n 



maximize 2 a i u i + Z bj v j 
i= 1 7=1 

subject to + Vy < Cy , i = 1,2,..., m, 

j = 1,2, ..., ft. 



To interpret the dual problem, we imagine an entrepreneur who, feeling that he can 
ship more efficiently, comes to the manufacturer with the offer to buy his product at 
the plant sites (origins) and sell it at the warehouses (destinations). The product price 
that is to be used in these transactions varies from point to point, and is determined 
by the entrepreneur in advance. He must choose these prices, of course, so that his 
offer will be attractive to the manufacturer. 

The entrepreneur, then, must select prices -U\,-U 2 , . . . , -u m for the m origins 
and vi , V 2 , . . . , v n for the n destinations. To be competitive with usual transportation 
modes, his prices must satisfy Ui + Vj < Cy for all /, j , since w* + Vj represents the 
net amount the manufacturer must pay to sell a unit of product at origin i and buy 
it back again at destination j. Subject to this constraint, the entrepreneur will adjust 
his prices to maximize his revenue. Thus, his problem is as given above. 



4.2 The Duality Theorem 

To this point the relation between the primal and dual programs has been simply a 
formal one based on what might appear as an arbitrary definition. In this section, 
however, the deeper connection between a program and its dual, as expressed by the 
Duality Theorem, is derived. 

The proof of the Duality Theorem given in this section relies on the Separating 
Hyperplane Theorem (Appendix B) and is therefore somewhat more advanced than 
previous arguments. It is given here so that the most general form of the Duality 
Theorem is established directly. An alternative approach is to use the theory of the 
simplex method to derive the duality result. A simplified version of this alternative 
approach is given in the next section. 

Throughout this section we consider the primal program in standard form 



minimize c T x 
subject to Ax = b, x > 0 



( 4 . 3 ) 



and its corresponding dual 



maximize y r b 
subject to y r A < c T . 



( 4 . 4 ) 



In this section it is not assumed that A is necessarily of full rank. The following 
lemma is easily established and gives us an important relation between the two 
problems. 
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Dual values Primal values 



Fig. 4.1 Relation of primal and dual values 



Lemma 1 (Weak Duality Lemma). Ifx and y are feasible for (4.3) and (4.4), respectively, 
then c T x ^ y T b. 



Proof. We have 



y r b = y r Ax < c r x, 

the last inequality being valid since x> 0 and y r A < c T . 



I 



This lemma shows that a feasible vector to either problem yields a bound on the 
value of the other problem. The values associated with the primal are all larger than 
the values associated with the dual as illustrated in Fig. 4.1. Since the primal seeks 
a minimum and the dual seeks a maximum, each seeks to reach the other. From this 
we have an important corollary. 

Corollary. If Xo and yo are feasible for (4.3) and (4.4), respectively, and if c T x 0 = yJ b - 
then Xq and yo are optimal for their respective problems. 

The above corollary shows that if a pair of feasible vectors can be found to the 
primal and dual programs with equal objective values, then these are both optimal. 
The Duality Theorem of linear programming states that the converse is also true, 
and that, in fact, the two regions in Fig. 4.1 actually have a common point; there is 
no “gap.” 

Duality Theorem of Linear Programming. If either of the problems (4.3) or (4.4) has a 
finite optimal solution, so does the other, and the corresponding values of the objective 
functions are equal. If either problem has an unbounded objective, the other problem has 
no feasible solution. 



Proof. We note first that the second statement is an immediate consequence of 
Lemma 1 . For if the primal is unbounded and y is feasible for the dual, we must 
have y r b < -M for arbitrarily large M, which is clearly impossible. 

Second we note that although the primal and dual are not stated in symmetric 
form it is sufficient, in proving the first statement, to assume that the primal has 
a finite optimal solution and then show that the dual has a solution with the same 
value. This follows because either problem can be converted to standard form and 
because the roles of primal and dual are reversible. 

Suppose (4.3) has a finite optimal solution with value zo • In the space E m+l define 
the convex set 



C = {(r, w) : r = tzo - c r x, w = tb - Ax, x ^ 0, t > 0}. 

It is easily verified that C is in fact a closed convex cone. We show that the point (1, 
0) is not in C. If w = fab - Ax 0 = 0 with fa > 0, x 0 ^ 0, then x = xo/fa is feasible 
for (4.3) and hence r/fa = Zo - c T x < 0; which means r < 0. If w = -Ax 0 = 0 
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with xo ^ 0 and c r xo = -1, and if x is any feasible solution to (4.3), then x + ax o is 
feasible for any a > 0 and gives arbitrarily small objective values as a is increased. 
This contradicts our assumption on the existence of a finite optimum and thus we 
conclude that no such xo exists. Hence (1,0 ) £ C. 

Now since C is a closed convex set, there is by Theorem 4.4, Sect. B.3, a hyper- 
plane separating (1,0) and C. Thus there is a nonzero vector [s, y] e E m+1 and a 
constant c such that 



s < c = infjsr + y r w : (r, w) e C}. 

Now since C is a cone, it follows that c > 0. For if there were (r, w) e C such that 
sr + y r w < 0, then a(r, w) for large a would violate the hyperplane inequality. On 
the other hand, since (0, 0 ) € C we must have c < 0. Thus c = 0. As a consequence 
s <0, and without loss of generality we may assume s = - 1. 

We have to this point established the existence of y e E m such that 

-r + y r w > 0 

for all (r, w) € C. Equivalently, using the definition of C, 

(c - y r A)x - tzo + ty T b ^ 0 

for all x ^ 0 , t ^ 0. Setting t = 0 yields y r A < c r , which says y is feasible for the 
dual. Setting x = 0 and t - 1 yields y r b ^ zo, which in view of Lemma 1 and its 
corollary shows that y is optimal for the dual. I 



4.3 Relations to the Simplex Procedure 

In this section the Duality Theorem is proved by making explicit use of the char- 
acteristics of the simplex procedure. As a result of this proof it becomes clear that 
once the primal is solved by the simplex procedure a solution to the dual is readily 
obtainable. 

Suppose that for the linear program 

minimize c r x 

(4 5) 

subject to Ax = b, x ^ 0, 

we have the optimal basic feasible solution x = (xb, 0) with corresponding basis B. 
We shall determine a solution of the dual program 



in terms of B. 



maximize y r b 
subject to y r A < c T 



(4.6) 
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We partition A as A = [B, D]. Since the basic feasible solution x B = B _1 b is 
optimal, the relative cost vector r must be nonnegative in each component. From 
Sect. 3.6 we have 

r D = C D “ CgB - 1 D, 

and since is nonnegative in each component we have CgB -1 D < c^. 

Now define y T = c^B -1 . We show that this choice of y solves the dual problem. 
We have 

y T A = [y r B, y r D] = [cj, CgB _ 1 D] < [c£, c£] = c T . 

Thus since y T A < c r , y is feasible for the dual. On the other hand, 

y r b = CgB _1 b = c£x B , 

and thus the value of the dual objective function for this y is equal to the value of 
the primal problem. This, in view of Lemma 1, Sect. 4.2, establishes the optimality 
of y for the dual. The above discussion yields an alternative derivation of the main 
portion of the Duality Theorem. 

Theorem. Let the linear program (4.5) have an optimal basic feasible solution correspond- 
ing to the basis B. Then the vector y satisfying y T = c t b B~ x is an optimal solution to the 
dual program (4.6). The optimal values of both problems are equal. 

We turn now to a discussion of how the solution of the dual can be obtained 
directly from the final simplex tableau of the primal. Suppose that embedded in the 
original matrix A is an m x m identity matrix. This will be the case if, for example, 
m slack variables are employed to convert inequalities to equalities. Then in the 
final tableau the matrix B -1 appears where the identity appeared in the beginning. 
Furthermore, in the last row the components corresponding to this identity matrix 
will be - CgB -1 , where Ci is the m-vector representing the cost coefficients of 
the variables corresponding to the columns of the original identity matrix. Thus by 
subtracting these cost coefficients from the corresponding elements in the last row, 
the negative of the solution y T - c^B -1 to the dual is obtained. In particular, if, as 
is the case with slack variables, Ci = 0, then the elements in the last row under B -1 
are equal to the negative of components of the solution to the dual. 

Example. Consider the primal program 

minimize -x\ - 4x2 ~ 3 x 3 
subject to 2xi + 2 x 2 + X 3 < 4 
X\ + 2x2 + 2x3 ^ 6 
x\ > 0, X2 > 0, X3 ^ 0. 



This can be solved by introducing slack variables and using the simplex proce- 
dure. The appropriate sequence of tableaus is given below without explanation. 
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The optimal solution is x\ = 0 , X2 = 1 , X3 = 2 . The corresponding dual program is 

maximize 4A\ + 6 A 2 
subject to 2 A\ + A2 < -1 
2A\ + 2/1-2 ^ — 4 
A\ + 2 T 2 ^ — 3 
/ti < 0 , T2 ^ 0 . 



The optimal solution to the dual is obtained directly from the last row of the sim- 
plex tableau under the columns where the identity appeared in the first tableau: 
A\ = —1, A 2 — — 1. 



Geometric Interpretation 



The duality relations can be viewed in terms of the dual interpretations of linear 
constraints emphasized in Chap. 3 . Consider a linear program in standard form. For 
sake of concreteness we consider the problem 

minimize 18 xi + 12^2 + 2x3 + 6x4 
subject to 3 xi + X2 - 2x3 + X4 = 2 
xi + 3x2 - X 4 = 2 

xi > 0, X2 ^ 0, X3 > 0, X4 > 0. 



The columns of the constraints are represented in requirements space in Fig. 4 . 2 . 
A basic solution represents construction of b with positive weights on two of the a/’s. 
The dual problem is 



maximize 2A\ + 2^2 

subject to 3 /li + T2 < 18 

A\ + 3T2 ^ 12 

- 2 / 1.1 ^ 2 
A\ — A 2 ^ 6. 



The dual problem is shown geometrically in Fig. 4 . 3 . Each column a* of the pri- 
mal defines a constraint of the dual as a half-space whose boundary is orthogonal 



4.3 Relations to the Simplex Procedure 



91 




*4 



Fig. 4.2 The primal requirements space 

to that column vector and is located at a point determined by q. The dual objective 
is maximized at an extreme point of the dual feasible region. At this point exactly 
two dual constraints are active. These active constraints correspond to an optimal 
basis of the primal. In fact, the vector defining the dual objective is a positive linear 
combination of the vectors. In the specific example, b is a positive combination of 
ai and a2. The weights in this combination are the xf s in the solution of the primal. 







Fig. 4.3 The dual in activity space 
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Simplex Multipliers 



We conclude this section by giving an economic interpretation of the relation 
between the simplex basis and the vector y. At any point in the simplex procedure 
we may form the vector y satisfying y T = c^B -1 . This vector is not a solution to the 
dual unless B is an optimal basis for the primal, but nevertheless, it has an economic 
interpretation. Furthermore, as we have seen in the development of the revised sim- 
plex method, this y vector can be used at every step to calculate the relative cost 
coefficients. For this reason y T = c^B -1 , corresponding to any basis, is often called 
the vector of simplex multipliers. 

Let us pursue the economic interpretation of these simplex multipliers. As usual, 
denote the columns of A by ai, a 2 , . . . , a w and denote by ei, e 2 , . . . , e m the m unit 
vectors in E m . The components of the a,’s and b tell how to construct these vectors 
from the e/’s. 

Given any basis B, however, consisting of m columns of A, any other vector 
can be constructed (synthetically) as a linear combination of these basis vectors. 
If there is a unit cost c t associated with each basis vector a/, then the cost of a 
(synthetic) vector constructed from the basis can be calculated as the corresponding 
linear combination of the c\ s associated with the basis. In particular, the cost of the 
jth unit vector, e 7 , when constructed from the basis B, is Aj, the jth component of 
y T = CgB -1 . Thus the A/s can be interpreted as synthetic prices of the unit vectors. 

Now, any vector can be expressed in terms of the basis B in two steps: (1) express 
the unit vectors in terms of the basis, and then (2) express the desired vector as a 
linear combination of unit vectors. The corresponding synthetic cost of a vector con- 
structed from the basis B can correspondingly be computed directly by: (1) finding 
the synthetic price of the unit vectors, and then (2) using these prices to evaluate the 
cost of the linear combination of unit vectors. Thus, the simplex multipliers can be 
used to quickly evaluate the synthetic cost of any vector that is expressed in terms of 
the unit vectors. The difference between the true cost of this vector and the synthetic 
cost is the relative cost. The process of calculating the synthetic cost of a vector, 
with respect to a given basis, by using the simplex multipliers is sometimes referred 
to as pricing out the vector. 

Optimality of the primal corresponds to the situation where every vector ai, a 2 , 

. . . , SL n is cheaper when constructed from the basis than when purchased directly at 
its own price. Thus we have y r a* < c\ for i = 1,2, . . . , n or equivalently y r A < c r . 



4.4 Sensitivity and Complementary Slackness 

The optimal values of the dual variables in a linear program can, as we have seen, be 
interpreted as prices. In this section this interpretation is explored in further detail. 
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Sensitivity 



Suppose in the linear program 

minimize c r x 

subject to Ax = b, x ^ 0, 1 

the optimal basis is B with corresponding solution (x B , 0), where x B = B 'b. A 
solution to the corresponding dual is y T = c^B -1 . 

Now, assuming nondegeneracy, small changes in the vector b will not cause the 
optimal basis to change. Thus for b + Ab the optimal solution is 



x = (x B + Ax b , 0), 



where Ax B = B 1 Ab. Thus the corresponding increment in the cost function is 



A z = CgAx B = y r Ab. (4.8) 

This equation shows that y gives the sensitivity of the optimal cost with respect to 
small changes in the vector b. In other words, if a new program were solved with b 
changed to b + Ab, the change in the optimal value of the objective function would 
be y r Ab. 

This interpretation of the dual vector y is intimately related to its interpretation 
as a vector of simplex multipliers. Since Aj is the price of the unit vector e 7 when 
constructed from the basis B, it directly measures the change in cost due to a change 
in the jth component of the vector b. Thus, Aj may equivalently be considered as 
the marginal price of the component bj , since if bj is changed to bj + A bj the value 
of the optimal solution changes by AjAbj. 

If the linear program is interpreted as a diet problem, for instance, then Aj is 
the maximum price per unit that the dietitian would be willing to pay for a small 
amount of the yth nutrient, because decreasing the amount of nutrient that must 
be supplied by food will reduce the food bill by Aj dollars per unit. If, as another 
example, the linear program is interpreted as the problem faced by a manufacturer 
who must select levels x\, * 2 , • • • , x n of n production activities in order to meet 
certain required levels of output b\, b 2 , . . . , b m while minimizing production costs, 
the AC s are the marginal prices of the outputs. They show directly how much the 
production cost varies if a small change is made in the output levels. 



Complementary Slackness 

The optimal solutions to primal and dual programs satisfy an additional relation 
that has an economic interpretation. This relation can be stated for any pair of dual 
linear programs, but we state it here only for the asymmetric and the symmetric 
pairs defined in Sect. 4.1. 
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Theorem. (Complementary slackness — asymmetric form). Let x and y be feasible solu- 
tions for the primal and dual programs, respectively, in the pair (4.2). A necessary and 
sufficient condition that they both be optimal solutions is that t for all i 

i) Xi > 0 => y T ai - Ci 
ii) Xi = 0 <= y T a j < cj. 

Proof. If the stated conditions hold, then clearly (y r A - c r )x = 0. Thus y r b = 
c r x, and by the corollary to Lemma 1, Sect. 4.2, the two solutions are optimal. 
Conversely, if the two solutions are optimal, it must hold, by the Duality Theo- 
rem, that y r b = c T x and hence that (y r A - c r )x = 0. Since each component of x is 
nonnegative and each component of y T A - c T is nonpositive, the conditions (i) and 
(ii) must hold. I 

Theorem. (Complementary slackness — symmetric form). Let x and y be feasible solutions 
for the primal and dual programs, respectively, in the pair (4.1). A necessary and sufficient 
condition that they both be optimal solutions is that for all i and j 

i) Xi > 0 => y r a* = q 

ii) Xi - 0 y r a ,• < q 

iii) Aj > 0 => a 7 x = bj 

iv) Aj = 0 <= a 7 x > b j, 

(where a 7 is the jth row of A). 

Proof. This follows by transforming the previous theorem. I 

The complementary slackness conditions have a rather obvious economic inter- 
pretation. Thinking in terms of the diet problem, for example, which is the primal 
part of a symmetric pair of dual problems, suppose that the optimal diet supplies 
more than bj units of the jth nutrient. This means that the dietitian would be unwill- 
ing to pay anything for small quantities of that nutrient, since availability of it would 
not reduce the cost of the optimal diet. This, in view of our previous interpretation 
of Aj as a marginal price, implies Aj = 0 which is (iv) of Theorem 4.4. The other 
conditions have similar interpretations which the reader can work out. 



4.5 Max Flow-Min Cut Theorem 

One of the most exemplary pairs of linear primal and dual problems is the max-flow 
and min-cut theorem, which we describe in this section. The maximal flow problem 
described in Chap. 2 can be expressed more compactly in terms of the node-arc 
incidence matrix (see Appendix D). Let x be the vector of arc flows Xjj (ordered in 
any way). Let A be the corresponding node-arc incidence matrix. Finally, let e be a 



^ The symbol => means “implies” and <= means “is implied by.’ 
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vector with dimension equal to the number of nodes and having a + 1 component on 
node 1, a - 1 on node m, and all other components zero. The maximal flow problem 
is then 



maximize / 

subject to Ax - /e = 0 (4.9) 

x < k. 

The coefficient matrix of this problem is equal to the node-arc incidence matrix with 
an additional column for the flow variable /. Any basis of this matrix is triangular, 
and hence as indicated by the theory in the transportation problem in Chap. 3, the 
simplex method can be effectively employed to solve this problem. However, instead 
of the simplex method, a simple algorithm based on the tree algorithm (also see 
Appendix D) can be used. 



Max Flow Augmenting Algorithm 

The basic strategy of the algorithm is quite simple. First we recognize that it is 
possible to send nonzero flow from node 1 to node m only if node m is reachable 
from node 1. The tree procedure can be used to determine if m is in fact reachable; 
and if it is reachable, the algorithm will produce a path from 1 to m. By examining 
the arcs along this path, we can determine the one with minimum capacity. We may 
then construct a flow equal to this capacity from 1 to m by using this path. This gives 
us a strictly positive (and integer- valued) initial flow. 

Next consider the nature of the network at this point in terms of additional flows 
that might be assigned. If there is already flow Xij in the arc (/, y), then the effective 
capacity of that arc is reduced by Xij ( to k t j - ), since that is the maximal amount 

of additional flow that can be assigned to that arc. On the other hand, the effective 
reverse capacity, on the arc (y, i), is increased by Xij ( to kji + x^), since a small incre- 
mental backward flow is actually realized as a reduction in the forward flow through 
that arc. Once these changes in capacities have been made, the tree procedure can 
again be used to find a path from node 1 to node m on which to assign additional 
flow. (Such a path is termed an augmenting path.) Finally, if m is not reachable 
from 1, no additional flow can be assigned, and the procedure is complete. 

It is seen that the method outlined above is based on repeated application of 
the tree procedure, which is implemented by labeling and scanning. By including 
slightly more information in the labels than in the basic tree algorithm, the minimum 
arc capacity of the augmenting path can be determined during the initial scanning, 
instead of by reexamining the arcs after the path is found. A typical label at a node 
i has the form ( k , c;), where k denotes a precursor node and q is the maximal flow 
that can be sent from the source to node i through the path created by the previous 
labeling and scanning. The complete procedure is this: 



96 



4 Duality and Complementarity 



Step 0. Set all x^ = 0 and / = 0. 

Step 1. Label node 1 (-, oo). All other nodes are unlabeled. 

Step 2. Select any labeled node i for scanning. Say it has label (k, ci). For all 
unlabeled nodes j such that (/, j) is an arc with x t j < kij , assign the label (/, Cj), 
where cj - min {c;, k^ - Xij). For all unlabeled nodes j such that (j, i) is an arc 
with xji > 0, assign the label (/, Cj ), where cj - min {q, xp}. 

Step 3. Repeat Step 2 until either node m is labeled or until no more labels can 
be assigned. In this latter case, the current solution is optimal. 

Step 4. (Augmentation.) If the node m is labeled (/, c m ), then increase / and 
the flow on arc (i, m ) by c m . Continue to work backward along the augmenting 
path determined by the nodes, increasing the flow on each arc of the path by c m . 
Return to Step 1 . 

The validity of the algorithm should be fairly apparent, that is, the finite termi- 
nation of the algorithm. However, a complete proof is deferred until we consider 
the max flow-min cut theorem below. 

Example. An example of the above procedure is shown in Fig. 4.4. Node 1 is the 
source, and node 6 is the sink. The original network with capacities indicated on the 
arcs is shown in Fig. 4.4a. Also shown in that figure are the initial labels obtained by 
the procedure. In this case the sink node is labeled, indicating that a flow of 1 unit 
can be achieved. The augmenting path of this flow is shown in Fig. 4.4b. Numbers 
in square boxes indicate the total flow in an arc. The new labels are then found and 
added to that figure. Note that node 2 cannot be labeled from node 1 because there 
is no unused capacity in that direction. Node 2 can, however, be labeled from node 
4, since the existing flow provides a reverse capacity of 1 unit. Again the sink is 
labeled, and 1 unit more flow can be constructed. The augmenting path is shown in 
Fig. 4.4c. A new labeling is appended to that figure. Again the sink is labeled, and 
an additional 1 unit of flow can be sent from source to sink. The path of this 1 unit is 
shown in Fig. 4.4d. Note that it includes a flow from node 4 to node 2, even though 
flow was not allowed in this direction in the original network. This flow is allowable 
now, however, because there is already flow in the opposite direction. The total flow 
at this point is shown in Fig. 4.4e. The flow levels are again in square boxes. This 
flow is maximal, since only the source node can be labeled. 



Max Flow-Min Cut Theorem 

A great deal of insight and some further results can be obtained through the 
introduction of the notion of cuts in a network. Given a network with source node 
1 and sink node m, divide the nodes arbitrarily into two sets S and S such that 
the source node is in S and the sink is in S . The set of arcs from S to S is a cut and 
is denoted ( S,S ). The capacity of the cut is the sum of the capacities of the arcs in 
the cut. 

An example of a cut is shown in Fig. 4.5. The set S consists of nodes 1 and 2, 
while S consists of 3, 4, 5, 6. The capacity of this cut is 4. 
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Fig. 4.5 A cut 

It should be clear that a path from node 1 to node m must include at least one arc 
in any cut, for the path must have an arc from the set S to the set S . Furthermore, it 
is clear that the maximal amount of flow that can be sent through a cut is equal to 
its capacity. Thus each cut gives an upper bound on the value of the maximal flow 
problem. The max flow-min cut theorem states that equality is actually achieved for 
some cut. That is, the maximal flow is equal to the minimal cut capacity. It should 
be noted that the proof of the theorem also establishes the maximality of the flow 
obtained by the maximal flow algorithm. 

Max Flow-Min Cut Theorem. In a network the maximal flow between a source and a sink 

is equal to the minimal cut capacity of all cuts separating the source and sink. 

Proof. Since any cut capacity must be greater than or equal to the maximal flow, it is 
only necessary to exhibit a flow and a cut for which equality is achieved. Begin with 
a flow in the network that cannot be augmented by the maximal flow algorithm. For 
this flow find the effective arc capacities of all arcs for incremental flow changes as 
described earlier and apply the labeling procedure of the maximal flow algorithm. 
Since no augmenting path exists, the algorithm must terminate before the sink is 
labeled. 

Let S and S consist of all labeled and unlabeled nodes, respectively. This defines 
a cut separating the source from the sink. All arcs originating in S and terminating 
in S have zero incremental capacity, or else a node in S could have been labeled. 
This means that each arc in the cut is saturated by the original flow; that is, the 
flow is equal to the capacity. Any arc originating in S and terminating in S , on the 
other hand, must have zero flow; otherwise, this would imply a positive incremental 
capacity in the reverse direction, and the originating node in S would be labeled. 
Thus, there is a total flow from S to S equal to the cut capacity, and zero flow from 
S to S . This means that the flow from source to sink is equal to the cut capacity. 
Thus the cut capacity must be minimal, and the flow must be maximal. I 

In the network of Fig. 4.4, the minimal cut corresponds to the S consisting only 
of the source. That cut capacity is 3. Note that in accordance with the max flow- 
min cut theorem, this is equal to the value of the maximal flow, and the minimal 
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cut is determined by the final labeling in Fig. 4.4e. In Fig. 4.5 the cut shown is also 
minimal, and the reader should easily be able to determine the pattern of maximal 
flow. 



Relation to Duality 



The character of the max flow-min cut theorem suggests a connection with the 
Duality Theorem. We conclude this section by exploring this connection. 

The maximal flow problem is a linear program, which is expressed formally 
by (4.9). The dual problem is found to be 



minimize w y k 
subject to u r A = w r 
u r e = 1 
w > 0. 



When written out in detail, the dual is 



minimize 
subject to 



2 w u k u 

ij 

Ui ~ Uj = Wij 
U ] Hjfi — 1 



> 0 . 



(4.10) 



(4.11) 



A pair i, j is included in the above only if (z, j) is an arc of the network. 

A feasible solution to this dual problem can be found in terms of any cut 
set (S, S). In particular, it is easily seen that 



Ui = 



Wf; 



if ieS 
if ieS 

if (Uj)e(Sj) 
otherwise 



(4.12) 



is a feasible solution. The value of the dual problem corresponding to this solution 
is the cut capacity. If we take the cut set to be the one determined by the labeling 
procedure of the maximal flow algorithm as described in the proof of the theorem 
above, it can be seen to be optimal by verifying the complementary slackness con- 
ditions (a task we leave to the reader). The minimum value of the dual is therefore 
equal to the minimum cut capacity. 
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4.6 The Dual Simplex Method 



Often there is available a basic solution to a linear program which is not feasible but 
which prices out optimally; that is, the simplex multipliers are feasible for the dual 
problem. In the simplex tableau this situation corresponds to having no negative ele- 
ments in the bottom row but an infeasible basic solution. Such a situation may arise, 
for example, if a solution to a certain linear programming problem is calculated and 
then a new problem is constructed by changing the vector b. In such situations a 
basic feasible solution to the dual is available and hence it is desirable to pivot in 
such a way as to optimize the dual. 

Rather than constructing a tableau for the dual problem (which, if the primal is 
in standard form; involves m free variables and n nonnegative slack variables), it is 
more efficient to work on the dual from the primal tableau. The complete technique 
based on this idea is the dual simplex method. In terms of the primal problem, 
it operates by maintaining the optimality condition of the last row while working 
toward feasibility. In terms of the dual problem, however, it maintains feasibility 
while working toward optimality. 

Given the linear program 



minimize c r x 
subject to Ax = b, x > 0, 

suppose a basis B is known such that y defined by y T = c^B 1 is feasible for the 
dual. In this case we say that the corresponding basic solution to the primal, xb = 
B -1 b, is dual feasible . If x# > 0 then this solution is also primal feasible and hence 
optimal. 

The given vector y is feasible for the dual and thus satisfies y r a> < Cj , for j = 
1,2, . . . , n. Indeed, assuming as usual that the basis is the first m columns of A, 
there is equality 

y r a j = cj , for j = 1, 2, . . . , m, (4.14a) 

and (barring degeneracy in the dual) there is inequality 

y r a j < cj , for j = m + 1, . . . , n. (4.14b) 

To develop one cycle of the dual simplex method, we find a new vector y such that 
one of the equalities becomes an inequality and one of the inequalities becomes 
equality, while at the same time increasing the value of the dual objective function. 
The m equalities in the new solution then determine a new basis. 

Denote the ith row of B -1 by u*. Then for 

f=y T -sa i , (4.15) 



we have y 7 a ; = y T aj - eu J a ; . Thus, recalling that Zj = y r a ; and noting that u'a ; = 
y t j , the ijth element of the tableau, we have 
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f*j = C P 

y r a i = Ci-e 

f*j = v - £ y^ 



j = m + 1 , m + 2 , 



j = 1 , 2 , . . . , m, i ^ j 



n. 



(4.16a) 

(4.16b) 

(4.16c) 



Also, 



y r b = y r b - ex B( . 



(4.17) 



These last equations lead directly to the algorithm: 

Step 1. Given a dual feasible basic solution xg, if xg ^ 0 the solution is optimal. 
If x B is not nonnegative, select an index i such that the ith component of xg, xg t < 
0 . 

Step 2. If all yy > 0, j = 1, 2, . . . , n , then the dual has no maximum (this follows 
since by (4.16) A is feasible for all s > 0). If y t j < 0 for some j, then let 



Step 3. Form a new basis B by replacing a* by a&. Using this basis determine the 

corresponding basic dual feasible solution xg and return to Step 1 . 

The proof that the algorithm converges to the optimal solution is similar in its 
details to the proof for the primal simplex procedure. The essential observations are: 
(a) from the choice of k in (4.18) and from (4.16a), (4.16b), 4.16c) the new solution 
will again be dual feasible; (b) by (4. 17) and the choice xg. < 0, the value of the dual 
objective will increase; (c) the procedure cannot terminate at a nonoptimum point; 
and (d) since there are only a finite number of bases, the optimum must be achieved 
in a finite number of steps. 

Example. A form of problem arising frequently is that of minimizing a positive 
combination of positive variables subject to a series of “greater than” type inequal- 
ities having positive coefficients. Such problems are natural candidates for applica- 
tion of the dual simplex procedure. The classical diet problem is of this type as is 
the simple example below. 



By introducing surplus variables and by changing the sign of the inequalities we 
obtain the initial tableau 

-1 -2-3 10 -5 

-© - 2-10 1 -6 

3 4 5 0 0 0 

Initial tableau 




(4.18) 



minimize 3x\ + 4x2 + 5^3 
subject to xi + 2 x 2 + 3 x 3 ^5 



2 xi + 2 x 2 + X 3 ^ 6 



xi >0, X2 ^ 0, X3 > 0. 
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The basis corresponds to a dual feasible solution since all of the cj - zf s are 
nonnegative. We select any xb ; < 0, say V 5 = - 6 , to remove from the set of basic 
variables. To find the appropriate pivot element in the second row we compute the 
ratios {zj - Cj)/y 2 j and select the minimum positive ratio. This yields the pivot indi- 
cated. Continuing, the remaining tableaus are 

0 -© -5/2 1 -1/2 -2 

1 11/2 0 -1/2 3 

0 17/20 3/2 9 

Second tableau 

0 1 5/2 -1 1/2 2 

10 - 21-11 
0 0 11 1 11 

Final tableau 

The third tableau yields a feasible solution to the primal which must be optimal. 
Thus the solution is x\ = 1, X 2 = 2, *3 = 0. 



*4.7 *The Primal-Dual Algorithm 



In this section a procedure is described for solving linear programming problems by 
working simultaneously on the primal and the dual problems. The procedure begins 
with a feasible solution to the dual that is improved at each step by optimizing an 
associated restricted primal problem. As the method progresses it can be regarded 
as striving to achieve the complementary slackness conditions for optimality. Orig- 
inally, the primal-dual method was developed for solving a special kind of linear 
program arising in network flow problems, and it continues to be the most efficient 
procedure for these problems. (For general linear programs the dual simplex method 
is most frequently used). In this section we describe the generalized version of the 
algorithm and point out an interesting economic interpretation of it. We consider the 
program 

minimize c r x 

(4 19) 

subject to Ax = b, x > 0 



and the corresponding dual program 



maximize y r b 
subject to y r A < c T . 



(4.20) 



Given a feasible solution y to the dual, define the subset P of 1, 2, . . . , n by 
i e P if y T ai = Ci where a ; is the ith column of A. Thus, since y is dual feasible, it 
follows that i t P implies y r a* < c*-. Now corresponding to y and P, we define the 
associated restricted primal problem 
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minimize l r y 
subject to Ax + y = b 



(4.21) 



x ^ 0, Xi = 0 for i £ P 

y > 0 , 



where 1 denotes the m - vector (1, 1, 1). 



The dual of this associated restricted primal is called the associated restricted 
dual. It is 



maximize u r b 

subject to u r a* < 0, i P 



(4.22) 



The condition for optimality of the primal-dual method is expressed in the following 
theorem. 

Primal-Dual Optimality Theorem. Suppose that y is feasible for the dual and that x and 
y = 0 is feasible {and of course optimal) for the associated restricted primal. Then x and y 
are optimal for the original primal and dual programs, respectively. 

Proof. Clearly x is feasible for the primal. Also we have c r x = y r Ax, because y r A 
is identical to c T on the components corresponding to nonzero elements of x. Thus 
c r x = y r Ax = y r b and optimality follows from Lemma 1, Sect. 4.2. 1 

The primal-dual method starts with a feasible solution to the dual and then 
optimizes the associated restricted primal. If the optimal solution to this associated 
restricted primal is not feasible for the primal, the feasible solution to the dual is 
improved and a new associated restricted primal is determined. Here are the details: 

Step 1. Given a feasible solution yo to the dual program (4.20), determine the 
associated restricted primal according to (4.21). 

Step 2. Optimize the associated restricted primal. If the minimal- value of this 
problem is zero, the corresponding solution is optimal for the original primal 
by the Primal-Dual Optimality Theorem. 

Step 3. If the minimal value of the associated restricted primal is strictly posi- 
tive, obtain from the final simplex tableau of the restricted primal, the solution 
uo of the associated restricted dual (4.22). If there is no j for which u^a j > 0 
conclude the primal has no feasible solutions. If, on the other hand, for at least 
one j, uja 7 > 0, define the new dual feasible vector 



Now go back to Step 1 using this y. 

To prove convergence of this method a few simple observations and explanations 
must be made. First we verify the statement made in Step 3 that uj a j < 0 for all j 



y = yo + £ 0 uo 



where 
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implies that the primal has no feasible solution. The vector y £ = y 0 + euo is feasible 
for the dual problem for all positive s , since uj A < 0. In addition, yjb = yj b+suj b 
and, since u^b = l T y > 0, we see that as s is increased we obtain an unbounded 
solution to the dual. In view of the Duality Theorem, this implies that there is no 
feasible solution to the primal. 

Next suppose that in Step 3, for at least one j, UqSlj > 0. Again we define the 
family of vectors y £ = yo + euo. Since uo is a solution to (4.22) we have uja * < 0 
for i e P, and hence for small positive s the vector y e is feasible for the dual. We 
increase s to the first point where one of inequalities yja 7 - < cj , j i P becomes 
an equality. This determines so > 0 and k. The new y vector corresponds to an in- 
creased value of the dual objective y r b = yj b+EU^ b. In addition, the corresponding 
new set P now includes the index k. Any other index i that corresponded to a pos- 
itive value of X( in the associated restricted primal is in the new set P, because by 
complementary slackness uja ; = 0 for such an i and thus y r a ; = y^a* +£oUq a* = c*. 
This means that the old optimal solution is feasible for the new associated restricted 
primal and that a^ can be pivoted into the basis. Since uj a^ > 0, pivoting in a^ will 
decrease the value of the associated restricted primal. 

In summary, it has been shown that at each step either an improvement in the 
associated primal is made or an infeasibility condition is detected. Assuming non- 
degeneracy, this implies that no basis of the associated primal is repeated — and since 
there are only a finite number of possible bases, the solution is reached in a finite 
number of steps. 

The primal-dual algorithm can be given an interesting interpretation in terms of 
the manufacturing problem in Example 2, Sect. 2.2. Suppose we own a facility that 
is capable of engaging in n different production activities each of which produces 
various amounts of m commodities. Each activity i can be operated at any level 
Xi > 0, but when operated at the unity level the ith activity costs c t dollars and yields 
the m commodities in the amounts specified by the m- vector a*. Assuming linearity 
of the production facility, if we are given a vector b describing output requirements 
of the m commodities, and we wish to produce these at minimum cost, ours is the 
primal problem. 

Imagine that an entrepreneur not knowing the value of our requirements vector b 
decides to sell us these requirements directly. He assigns a price vector yo to these 
requirements such that yj A < c. In this way his prices are competitive with our 
production activities, and he can assure us that purchasing directly from him is no 
more costly than engaging activities. As owner of the production facilities we are 
reluctant to abandon our production enterprise but, on the other hand, we deem it not 
frugal to engage an activity whose output can be duplicated by direct purchase for 
lower cost. Therefore, we decide to engage only activities that cannot be duplicated 
cheaper, and at the same time we attempt to minimize the total business volume 
given the entrepreneur. Ours is the associated restricted primal problem. 

Upon receiving our order, the greedy entrepreneur decides to modify his prices 
in such a manner as to keep them competitive with our activities but increase the 
cost of our order. As a reasonable and simple approach he seeks new prices of the 
form 
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y = yo + su 0 , 

where he selects uq as the solution to 



maximize u r y 

subject to u r a* < 0, i e P 



u < 1. 



The first set of constraints is to maintain competitiveness of his new price vector for 
small £, while the second set is an arbitrary bound imposed to keep this subproblem 
bounded. It is easily shown that the solution uo to this problem is identical to the 
solution of the associated dual (4.22). After determining the maximum s to maintain 
feasibility, he announces his new prices. 

At this point, rather than concede to the price adjustment, we recalculate the new 
minimum volume order based on the new prices. As the greedy (and shortsighted) 
entrepreneur continues to change his prices in an attempt to maximize profit he 
eventually finds he has reduced his business to zero ! At that point we have, with his 
help, solved the original primal problem. 

Example. To illustrate the primal-dual method and indicate how it can be imple- 
mented through use of the tableau format consider the following problem: 

minimize 2x\ + X 2 + 4^3 

subject to x\ + X 2 + 2x3 = 3 

2xi + X 2 + 3x3 = 5 
Xi ^ 0, X2> 0, X3 ^ 0. 



Because all of the coefficients in the objective function are nonnegative, y = (0, 0) 
is a feasible vector for the dual. We lay out the simplex tableau shown below 



ai 


a 2 


a 3 






b 


1 


1 


2 


1 


0 


3 


2 


1 


3 


0 


1 


5 


-3 


-2 


-5 


0 


0 


-8 


2 


1 


4 









First tableau 



To form this tableau we have adjoined artificial variables in the usual manner. 
The third row gives the relative cost coefficients of the associated primal problem — 
the same as the row that would be used in a phase I procedure. In the fourth row 
are listed the c* - y r a/’s for the current y. The allowable columns in the associated 
restricted primal are determined by the zeros in this last row. 

Since there are no zeros in the last row, no progress can be made in the associated 
restricted primal and hence the original solution x\ = X2 = X3 = 0, y\ = 3, - 5 is 

optimal for this y. The solution uo to the associated restricted dual is Uo = (1, 1), and 
the numbers -uja*-, i = 1, 2, 3 are equal to the first three elements in the third row. 
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Thus, we compute the three ratios | , | from which we find so = The new 

values for the fourth row are now found by adding so times the (first three) elements 
of the third row to the fourth row. 



ai 


a 2 


a 3 






b 


1 


© 


2 


1 


0 


3 


2 


1 


3 


0 


1 


5 


-3 


-2 


-5 


0 


0 


-8 


1/2 


0 


3/2 









Second tableau 



Minimizing the new associated restricted primal by pivoting as indicated we obtain 



ai 


a 2 


a 3 






b 


1 


1 


2 


1 


0 


3 


1 


0 


1 


-1 


1 


2 


-1 


0 


-1 


2 


0 


-2 


-1/2 


0 


3/2 









Now we again calculate the ratios | obtaining £ 0 = \ , and add this multiple of 
the third row to the fourth row to obtain the next tableau. 

ai &2 33 b 

112 10 3 

© 0 1-11 2 
-1 0-1 2 0-2 
0 0 1 • • 

Third tableau 

optimizing the new restricted primal we obtain the tableau: 

3 l &2 33 • b 

01 1 2-11 

10 1-1 12 

0 0 0 1 1 0 

0 0 1 • • • 

Final tableau 

Having obtained feasibility in the primal, we conclude that the solution is also 
optimal: x\ - 2, x 2 = 1, ;r 3 = 0. 



4.8 Summary 



There is a corresponding dual linear program associated with every (primal) linear 
program. Both programs share the same underlying cost and constraint coefficients. 
We have demonstrated rich theorems to relate the pair. The variables of the dual 
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problem can be interpreted as prices associated with the constraints of the original 
(primal) problem, and through this association it is possible to give an economically 
meaningful characterization to the dual whenever there is such a characterization 
for the primal. 

Mathematically, the pair also establish an optimality certificate to each other: 
one cannot claim an optimal objective value unless you find an solution for the 
dual to achieve the same value of the dual objective. This also leads to the set of 
optimality conditions, including the complementarity conditions, that we would see 
many times in the rest of the book. 



4.9 Exercises 

1 . Verify in detail that the dual of a linear program is the original problem. 

2. Show that if a linear inequality in a linear program is changed to equality, the 
corresponding dual variable becomes free. 

3. Find the dual of 

minimize c r x 
subject to Ax = b, x > a 
where a > 0. 

4. Show that in the transportation problem the linear equality constraints are not 
linearly independent, and that in an optimal solution to the dual problem the 
dual variables are not unique. Generalize this observation to any linear program 
having redundant equality constraints. 

5. Construct an example of a primal problem that has no feasible solutions and 
whose corresponding dual also has no feasible solutions. 

6. Let A be an mxn matrix and b be an ^-vector. Prove that Ax < 0 implies c r x < 0 
if and only if c T = y r A for some y > 0. Give a geometric interpretation of the 
result. 

7. There is in general a strong connection between the theories of optimization and 
free competition, which is illustrated by an idealized model of activity location. 
Suppose there are n economic activities (various factories, homes, stores, etc.) 
that are to be individually located on n distinct parcels of land. If activity i is 
located on parcel j that activity can yield sy units (dollars) of value. 

If the assignment of activities to land parcels is made by a central authority, it 
might be made in such a way as to maximize the total value generated. In other 
words, the assignment would be made so as to maximize 2/ Zy s ij x ij where 

( 1 if activity i is assigned to parcel j 
0 otherwise 




108 



4 Duality and Complementarity 



More explicitly this approach leads to the optimization problem 



maximize X X s ij x ij 
i j 

subject to X x ij = 1, / = 1, 2, . . . , n 

j 

YjXij = 1, j = 1,2, n 

i 

> 0, Xij = 0 or 1 . 

Actually, it can be shown that the final requirement (Xy = 0 or 1 ) is automati- 
cally satisfied at any extreme point of the set defined by the other constraints, so 
that in fact the optimal assignment can be found by using the simplex method 
of linear programming. 

If one considers the problem from the viewpoint of free competition, it is 
assumed that, rather than a central authority determining the assignment, the 
individual activities bid for the land and thereby establish prices. 

(a) Show that there exists a set of activity prices pu i = 1,2, . . . , n and land 
prices qj, j = 1, 2, . . . , n such that 

Pi + qj > Sy, i = 1, 2, . . . , n, j = 1, 2, . . . , n 

with equality holding if in an optimal assignment activity i is assigned to 
parcel j. 

(b) Show that Part (a) implies that if activity i is optimally assigned to parcel j 
and if / is any other parcel 

s ij - Qj ^ s ij' - Qj'- 

Give an economic interpretation of this result and explain the relation 
between free competition and optimality in this context. 

(c) Assuming that each Sy is positive, show that the prices can all be assumed 
to be nonnegative. 

8. Construct the dual of the combinatorial auction problem of Example 7 of 
Chap. 2, and give an economical interpretation for each type of the dual vari- 
ables. 

9. Game theory is in part related to linear programming theory. Consider the game 
in which player X may select any one of m moves, and player Y may select any 
one of n moves. If X selects i and Y selects j, then X wins an amount a t j from Y. 
The game is repeated many times. Player X develops a mixed strategy where the 
various moves are played according to probabilities represented by the compo- 
nents of the vector x = (jti, X 2 , . . . , x m ), where x\ ^ 0, i = 1,2, . . . , m 

m 

and X x i ~ 1- Likewise Y develops a mixed strategy y = (yi, y 2 , . . y n ), 

i= 1 

n 

where y t > 0, i = 1, 2, . . . , n and X yi - 1- The average payoff to X is then 

i= 1 

P(x, y) = x r Ay. 
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(a) Suppose X selects x as the solution to the linear program 

maximize A 

m 

subject to Y *i = 1 

i= 1 
m 

Z Xidij > A, j = 1,2, n 
1=1 

Xi> 0, i = 1, 2, . . . , m. 

Show that X is guaranteed a payoff of at least A no matter what y is chosen 
by Y. 

(b) Show that the dual of the problem above is 

minimize B 

n 

subject to Y yj - 1 
j= i 

n 

Y CLifyj </?,/ = 1,2, . . . , m 
7=1 

X/ >0, j = 1,2, ..., n. 

(c) Prove that max A = mini?. (The common value is called the va/we of the 
game.) 

(d) Consider the “matching” game. Each player selects heads or tails. If the 
choices match, X wins $1 from Y; if they do not match, Y wins $1 from X. 
Find the value of this game and the optimal mixed strategies. 

(e) Repeat Part (d) for the game where each player selects either 1, 2, or 3. 
The player with the highest number wins $1 unless that number is exactly 
1 higher than the other player’s number, in which case he loses $3. When 
the numbers are equal there is no payoff. 

10. Consider the primal linear program in the standard form. Suppose that this 
program and its dual are feasible. Let y be a known optimal solution to the 
dual. 

(a) If the Mi equation of the primal is multiplied by yu ^ 0, determine an opti- 
mal solution w to the dual of this new problem. 

(b) Suppose that, in the original primal, we add /i times the Mi equation to 
the rth equation. What is an optimal solution w to the corresponding dual 
problem? 

(c) Suppose, in the original primal, we add /i times the Mi row of A to c. What 
is an optimal solution to the corresponding dual problem? 

1 1 . Consider the linear program (P) of the form 

minimize q r z 

subject to Mz > -q, z > 0 

in which the matrix M is skew symmetric ; that is, M = -M r . 
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(a) Show that problem (P) and its dual are the same. 

(b) A problem of the kind in part (a) is said to be self-dual. An example of a 
self-dual problem has 



0 -A 7- 




c 




X 


A 0 


> q = 


-b 


, z = 


_y_ 



Give an interpretation of the problem with this data. 

(c) Show that a self-dual linear program has an optimal solution if and only if 
it is feasible. 

12. A company may manufacture n different products, each of which uses various 
amounts of m limited resources. Each unit of product i yields a profit of q 
dollars and uses aji units of the jth resource. The available amount of the yth 
resource is bj. To maximize profit the company selects the quantities x> to be 
manufactured of each product by solving 

maximize c T x 
subject to Ax = b, x ^ 0. 

The unit profits q already take into account the variable cost associated with 
manufacturing each unit. In addition to that cost, the company incurs a fixed 
overhead H , and for accounting purposes it wants to allocate this overhead to 
each of its products. In other words, it wants to adjust the unit profits so as to 
account for the overhead. Such an overhead allocation scheme must satisfy two 
conditions: (4.1) Since H is fixed regardless of the product mix, the overhead 
allocation scheme must not alter the optimal solution, (4.2) All the overhead 
must be allocated; that is, the optimal value of the objective with the modified 
cost coefficients must be H dollars lower than z — the original optimal value of 
the objective. 

(a) Consider the allocation scheme in which the unit profits are modified 
according to c T = c r -ryj A, where yo is the optimal solution to the original 
dual and r = H/zo (assume H < zq). 

(i) Show that the optimal x for the modified problem is the same as that 
for the original problem, and the new dual solution is yo = (1 - r) yo. 

(ii) Show that this approach fully allocates H. 

(b) Suppose that the overhead can be traced to each of the resource constraints. 
Let Hi ^ 0 be the amount of overhead associated with the ith resource, 

m 

where 2 Hi < Zo and r t = Hi/bi < A® for i = 1, . . . , m. Based on this 

i= 1 

information, an allocation scheme has been proposed where the unit profits 
are modified such that c T = c T - r T A. 

(i) Show that the optimal x for this modified problem is the same as that for 
the original problem, and the corresponding dual solution is yo = yo -r. 

(ii) Show that this scheme fully allocates H. 
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13. Solve the linear inequalities 

- 2 xi + 2 x 2 < -1 
2xi - X2 < 2 

- 4x 2 < 3 

-15xi - 12x 2 < -2 
12xi + 20x 2 < -1. 

Note that x\ and x 2 are not restricted to be positive. Solve this problem by 
considering the problem of maximizing 0 • x\ + 0 • x 2 subject to these constraints, 
taking the dual and using the simplex method. 

14. (a) Using the simplex method solve 

minimize 2 x\ - x 2 
subject to 2xi - x 2 - X3 > 3 
x\ - x 2 + X3 > 2 
Xi > 0 , i = 1,2,3. 

(Hint: Note that x\ =2 gives a feasible solution.) 

(b) What is the dual problem and its optimal solution? 

15. (a) Using the simplex method solve 

minimize 2xi + 3x 2 + 2x3 + 2x4 
subject to xi + 2x 2 + X3 + 2x4 = 3 

X\ + x 2 + 2x3 + 4x4 = 3 
Xi >0, i = 1, 2, 3, 4. 

(b) Using the work done in Part (a) and the dual simplex method, solve the 
same problem but with the right-hand sides of the equations changed to 8 
and 7 respectively. 

16. For the problem 

minimize 5xi + 3x 2 

subject to 2xi - x 2 + 4x3 < 4 

Xi + x 2 + 2x3 ^ 5 

2xi - x 2 + X3 >1 

xi > 0, X2> 0 , X 3 > 0 ; 

(a) Using a single pivot operation with pivot element 1, find a feasible solution. 

(b) Using the simplex method, solve the problem. 

(c) What is the dual problem? 

(d) What is the solution to the dual? 
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17. Solve the following problem by the dual simplex method: 

minimize —lx\ + 1 x 2 ~ 2 x 3 - X 4 - 6 x 5 
subject to 3xi - X 2 + X 3 - 2 x 4 = -3 

2xi + X 2 + X 4 + X 5 = 4 
—X\ + 3X2 - 3X4 + X 6 =12 

and x, > 0 , i = 1 , . . . , 6 . 

18. Given the linear programming problem in standard form (4.3) suppose a basis B 
and the corresponding (not necessarily feasible) primal and dual basic solutions 
x and y are known. Assume that at least one relative cost coefficient q - y r a * is 
negative. Consider the auxiliary problem 

minimize c r x 
subject to Ax = b 

I i x i +y = M 

ieT 

x ^ 0 , y > 0 , 

where T = {i : c t - y r a* < 0}, y is a slack variable, and M is a large positive 
constant. Show that if k is the index corresponding to the most negative rela- 
tive cost coefficient in the original solution, then (y, Ck - y r a^) is dual feasible 
for the auxiliary problem. Based on this observation, develop a bi g-M artificial 
constraint method for the dual simplex method. (Refer to Exercise 24, Chap. 3.) 

19. A textile firm is capable of producing three products — xi, X 2 , X 3 . Its production 
plan for next month must satisfy the constraints 

x\ + 2 x 2 + 2 x 3 < 12 
2 xi + 4x2 + X 3 < / 
x\ > 0 , X 2 > 0 , X 3 ^ 0 . 



The first constraint is determined by equipment availability and is fixed. The 
second constraint is determined by the availability of cotton. The net profits of 
the products are 2, 3, and 3, respectively, exclusive of the cost of cotton and 
fixed costs. 

(a) Find the shadow price A 2 of the cotton input as a function of /. (Hint: Use 
the dual simplex method.) Plot A 2 (/) and the net profit z(f) exclusive of the 
cost for cotton. 

(b) The firm may purchase cotton on the open market at a price of 1/6. How- 
ever, it may acquire a limited amount at a price of 1/12 from a major sup- 
plier that it purchases from frequently. Determine the net profit of the firm 
n(s ) as a function of s. 
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20. A certain telephone company would like to determine the maximum number 
of long-distance calls from Westburgh to Eastville that it can handle at any one 
time. The company has cables linking these cities via several intermediary cities 
as follows: 




Each cable can handle a maximum number of calls simultaneously as indicated 
in the figure. For example, the number of calls routed from Westburgh to North- 
gate cannot exceed five at any one time. A call from Westburgh to Eastville can 
be routed through any other city, as long as there is a cable available that is not 
currently being used to its capacity. In addition to determining the maximum 
number of calls from Westburgh to Eastville, the company would, of course, 
like to know the optimal routing of these calls. Assume calls can be routed only 
in the directions indicated by the arrows. 

(a) Formulate the above problem as a linear programming problem with upper 
bounds. 

(Hint: Denote by Xy the number of calls routed from city i to city j.) 

(b) Find the solution by inspection of the graph. 

21 . Apply the maximal flow algorithm to the network below. All arcs have capacity 
1 unless otherwise indicated. 




22. Consider the problem 

minimize 2x\ + X 2 + 4^3 

subject to X\ + X 2 + 2 x 3 = 3 

2xi + X 2 + 3x3 = 5 
Xi >0, X 2 > 0, X 3 > 0. 
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(a) What is the dual problem? 

(b) Note that y = (1, 0) is feasible for the dual. Starting with this y, solve the 
primal using the primal-dual algorithm. 

23. Show that in the associated restricted dual of the primal-dual method the 
objective y r b can be replaced by y r y. 

24. Consider the primal feasible region in standard form Ax = b, x ^ 0 , where A is 
anmxn matrix, b is a constant nonzero m-vector, and x is a variable ^-vector. 

(a) A variable Xi is said to be a null variable if Xi = 0 in every feasible solution. 
Prove that, if the feasible region is non-empty, x t is a null variable if and only 
if there is a nonzero vector y e E m such that y r A > 0, y r b = 0. and the ith 
component of y r A is strictly positive. 

(b ) Strict complementarity Let the feasible region be nonempty. Then there is a 
feasible x and vector y e E m such that 

y r A > 0, y r b = 0, y r b + x > 0. 

(c) A variable x t is a nonextremal variable if Xi > 0 in every feasible solution. 
Prove that, if the feasible region is non-empty, x t is a nonextremal variable 
if and only if there is y e E m and d e E n such that y r A = d r , where 
di — - 1, dj >0 for j ^ i; and such that y r b < 0. 
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Chapter 5 

Interior-Point Methods 



Linear programs can be viewed in two somewhat complementary ways. They are, 
in one view, a class of continuous optimization problems each with continuous vari- 
ables defined on a convex feasible region and with a continuous objective function. 
They are, therefore, a special case of the general form of problem considered in 
this text. However, linearity implies a certain degree of degeneracy, since for exam- 
ple the derivatives of all functions are constants and hence the differential methods 
of general optimization theory cannot be directly used. From an alternative view, 
linear programs can be considered as a class of combinatorial problems because it 
is known that solutions can be found by restricting attention to the vertices of the 
convex polyhedron defined by the constraints. Indeed, this view is natural when con- 
sidering network problems such as those of early chapters. However, the number of 
vertices may be large, up to n\/m\(n — m ) !, making direct search impossible for even 
modest size problems. 

The simplex method embodies both of these viewpoints, for it restricts attention 
to vertices, but exploits the continuous nature of the variables to govern the progress 
from one vertex to another, defining a sequence of adjacent vertices with improving 
values of the objective as the process reaches an optimal point. The simplex method, 
with ever-evolving improvements, has for five decades provided an efficient general 
method for solving linear programs. 

Although it performs well in practice, visiting only a small fraction of the total 
number of vertices, a definitive theory of the simplex method’s performance was 
unavailable. However, in 1972, Klee and Minty showed by examples that for certain 
linear programs the simplex method will examine every vertex f. These examples 
proved that in the worst case, the simplex method requires a number of steps that is 
exponential in the size of the problem. 
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5 Interior-Point Methods 



In view of this result, many researchers believed that a good algorithm, differ- 
ent than the simplex method, might be devised whose number of steps would be 
polynomial rather than exponential in the program’s size — that is, the time required 
to compute the solution would be bounded above by a polynomial in the size of the 
problem. 

Indeed, in 1979, a new approach to linear programming, Khachiyan’s ellipsoid 
method was announced with great acclaim. The method is quite different in struc- 
ture than the simplex method, for it constructs a sequence of shrinking ellipsoids 
each of which contains the optimal solution set and each member of the sequence 
is smaller in volume than its predecessor by at least a certain fixed factor. There- 
fore, the solution set can be found to any desired degree of approximation by con- 
tinuing the process. Khachiyan proved that the ellipsoid method, developed dur- 
ing the 1970s by other mathematicians, is a polynomial-time algorithm for linear 
programming. 

Practical experience, however, was disappointing. In almost all cases, the simplex 
method was much faster than the ellipsoid method. However, Khachiyan’s ellipsoid 
method showed that polynomial time algorithms for linear programming do exist. 
It left open the question of whether one could be found that, in practice, was faster 
than the simplex method. 

It is then perhaps not surprising that the announcement by Karmarkar in 1984 
of a new polynomial time algorithm, an interior-point method, with the potential 
to improve the practical effectiveness of the simplex method made front-page news 
in major newspapers and magazines throughout the world. It is this interior-point 
approach that is the subject of this chapter and the next. 

This chapter begins with a brief introduction to complexity theory, which is the 
basis for a way to quantify the performance of iterative algorithms, distinguishing 
polynomial-time algorithms from others. 

Next the example of Klee and Minty showing that the simplex method is not 
a polynomial-time algorithm in the worst case is presented. Following that the 
ellipsoid algorithm is defined and shown to be a polynomial-time algorithm. These 
two sections provide a deeper understanding of how the modern theory of linear 
programming evolved, and help make clear how complexity theory impacts linear 
programming. However, the reader may wish to consider them optional and omit 
them at first reading. 

The development of the basics of interior-point theory begins with Sect. 5.4 
which introduces the concept of barrier functions and the analytic center. Section 5.5 
introduces the central path which underlies interior-point algorithms. The relations 
between primal and dual in this context are examined. An overview of the details 
of specific interior-point algorithms based on the theory are presented in Sects. 5.6 
and 5.7 



1 We will be more precise about complexity notions such as “polynomial algorithm” in Sect. 5.1 
below. 
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5.1 Elements of Complexity Theory 



Complexity theory is arguably the foundation for analysis of computer algorithms. 
The goal of the theory is twofold: to develop criteria for measuring the effectiveness 
of various algorithms (and thus, be able to compare algorithms using these criteria), 
and to assess the inherent difficulty of various problems. 

The term complexity refers to the amount of resources required by a computa- 
tion. In this chapter we focus on a particular resource, namely, computing time. In 
complexity theory, however, one is not interested in the execution time of a pro- 
gram implemented in a particular programming language, running on a particular 
computer over a particular input. This involves too many contingent factors. In- 
stead, one wishes to associate to an algorithm more intrinsic measures of its time 
requirements. 

Roughly speaking, to do so one needs to define: 

• a notion of input size , 

• a set of basic operations , and 

• a cost for each basic operation. 

The last two allow one to associate a cost of a computation. If v is any input, the 
cost C(x ) of the computation with input v is the sum of the costs of all the basic 
operations performed during this computation. 

Let be an algorithm and ff n be the set of all its inputs having size n. The 
worst-case cost function of tA is the function T ™ defined by 

T£(n) = sup C(x). 

xej n 

If there is a probability structure on ff n it is possible to define the average-case cost 
function T ^ given by 

T^(n) = E n (C(x)f 

where E n is the expectation over ff n . However, the average is usually more difficult 
to find, and there is of course the issue of what probabilities to assign. 

We now discuss how the objects in the three items above are selected. The selec- 
tion of a set of basic operations is generally easy. For the algorithms we consider 
in this chapter, the obvious choice is the set {+, -, x, /, <} of the four arithmetic 
operations and the comparison. Selecting a notion of input size and a cost for the 
basic operations depends on the kind of data dealt with by the algorithm. Some 
kinds can be represented within a fixed amount of computer memory; others require 
a variable amount. 

Examples of the first are fixed-precision floating-point numbers, stored in a fixed 
amount of memory (usually 32 or 64 bits). For this kind of data the size of an 
element is usually taken to be 1 and consequently to have unit size per number. 

Examples of the second are integer numbers which require a number of bits 
approximately equal to the logarithm of their absolute value. This (base 2) logarithm 
is usually referred to as the bit size of the integer. Similar ideas apply for rational 
numbers. 
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Let A be some kind of data and x = (x\, . . . , x n ) e A n . If A is of the first kind 
above then we define size(x) = n. Otherwise, we define size(x) = YH=\ bit-size (v;). 

The cost of operating on two unit-size numbers is taken to be 1 and is called the 
unit cost. In the bit-size case, the cost of operating on two numbers is the product of 
their bit- sizes (for multiplications and divisions) or their maximum (for additions, 
subtractions, and comparisons). 

The consideration of integer or rational data with their associated bit size and 
bit cost for the arithmetic operations is usually referred to as the Turing model of 
computation. The consideration of idealized reals with unit size and unit cost is 
today referred as the real number arithmetic model. When comparing algorithms, 
one should make clear which model of computation is used to derive complexity 
bounds. 

A basic concept related to both models of computation is that of polynomial 
time. An algorithm is said to be a polynomial time algorithm if Tffn) is bounded 
above by a polynomial. A problem can be solved in polynomial time if there is a 
polynomial time algorithm solving the problem. The notion of average polynomial 
time is defined similarly, replacing 7^ by 7^. 

The notion of polynomial time is usually taken as the formalization of efficiency 
in complexity theory. 



*5.2 *The Simplex Method Is Not Polynomial-Time 

When the simplex method is used to solve a linear program in standard form with 
coefficient matrix A e E mxn , b e E m and c e E n , the number of pivot steps to solve 
the problem starting from a basic feasible solution is typically a small multiple of 
m: usually between 2m and 3m. In fact, Dantzig observed that for problems with 
m < 50 and n < 200 the number of iterations is ordinarily less than 1.5m. 

At one time researchers believed — and attempted to prove — that the simplex 
algorithm (or some variant thereof) always requires a number of iterations that is 
bounded by a polynomial expression in the problem size. That was until Victor Klee 
and George Minty exhibited a class of linear programs each of which requires an 
exponential number of iterations when solved by the conventional simplex method. 

One form of the Klee-Minty example is 

n 

maximize 

7-1 
i - 1 

subject to 2 2 10 l ~ j xj + x t < 100* -1 / = 1, . . . , n (5.1) 

7=1 

Xj > 0 j = 1 , . . . , n 



The problem above is easily cast as a linear program in standard form. 
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A specific case is that for n = 3, giving 

maximize lOOxi + 10^2 + *3 
subject to x\ <1 

20xi + X2 <100 

200xi + 20x2 3 " V3 < 10, 000 

x \ > 0, X2 ^ 0, X3 > 0. 



In this case, we have three constraints and three variables (along with their non- 
negativity constraints). After adding slack variables, the problem is in standard form. 
The system has m - 3 equations and n = 6 nonnegative variables. It can be verified 
that it takes 2 3 - 1 = 7 pivot steps to solve the problem with the simplex method 
when at each step the pivot column is chosen to be the one with the largest (because 
this a maximization problem) reduced cost. (See Exercise 1.) 

The general problem of the class (1) takes 2 n - 1 pivot steps and this is in fact 
the number of vertices minus one (which is the starting vertex). To get an idea of 
how bad this can be, consider the case where n = 50. We have 2 50 - 1 « 10 15 . In 
a year with 365 days, there are approximately 3 x 10 7 s. If a computer ran contin- 
uously, performing a million pivots of the simplex algorithm per second, it would 
take approximately 



10 15 

3 x 10 7 x 10 6 



« 33 years 



to solve a problem of this class using the greedy pivot selection rule. 

Although it is not polynomial in the worst case, the simplex method remains 
one of major solvers for linear programming. In fact, the method has been recently 
proved to be (strongly) polynomial for solving the Markov Decision Process with 
any fixed discount rate. 



*5.3 *The Ellipsoid Method 

The basic ideas of the ellipsoid method stem from research done in the 1960s and 
1970s mainly in the Soviet Union (as it was then called) by others who preceded 
Khachiyan. In essence, the idea is to enclose the region of interest in ever smaller 
ellipsoids. 

The significant contribution of Khachiyan was to demonstrate in that under cer- 
tain assumptions, the ellipsoid method constitutes a polynomially bounded algorithm 
for linear programming. 

The version of the method discussed here is really aimed at finding a point of a 
polyhedral set Q given by a system of linear inequalities. 

n = {y e E m : y r a j < cj, j = 1, . . . n] 
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Finding a point of El can be thought of as equivalent to solving a linear programming 
problem. 

Two important assumptions are made regarding this problem: 

(Al) There is a vector yo 6 E m and a scalar R > 0 such that the closed ball S (yo, R) 
with center yo and radius R , that is 

{y e E m : |y - y 0 | < R}, 

contains El. 

(A2) If El is nonempty, there is a known scalar r > 0 such that Q contains a ball 
of the form S (y\ r) with center at y* and radius r. (This assumption implies 
that if El is nonempty, then it has a nonempty interior and its volume is at least 
vol(S(0, r)).) 2 

Definition. An ellipsoid in E m is a set of the form 

E = [y € E m : (y - z) r Q(y — z) < 1} 

where z e E m is a given point (called the center ) and Q is a positive definite matrix (see 
Sect. A.4 of Appendix A) of dimension mxm. This ellipsoid is denoted E( z, Q). 

The unit sphere S (0, 1) centered at the origin 0 is a special ellipsoid with Q = I, the 
identity matrix. 

The axes of a general ellipsoid are the eigenvectors of Q and the lengths of the 
axes are T“ 1/2 , T“ 1/2 , . . . , T“ 1/2 , where the Af s are the corresponding eigenvalues. 
It can be shown that the volume of an ellipsoid is 

vol(E) = vol(S(0, l))n''l r r l/2 = voic'd), l))det(CT 1/2 ). 



Cutting Plane and New Containing Ellipsoid 

In the ellipsoid method, a series of ellipsoids Ek is defined, with centers y^ and with 
the defining Q = B^ 1 , where is symmetric and positive definite. 

At each iteration of the algorithm, we have Q c E^. It is then possible to check 
whether y^ e El. If so, we have found an element of Q as required. If not, there is at 
least one constraint that is violated. Suppose ajy^ > cj. Then 

& C ^E k = {y e E k : ajy < ajy k ) 

This set is half of the ellipsoid, obtained by cutting the ellipsoid in half through its 
center (Fig. 5.1). 



2 The (topological) interior of any set f2 is the set of points in Q which are the centers of some balls 
contained in f2. 
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The successor ellipsoid E k + 1 is defined to be the minimal-volume ellipsoid 
containing (1/2 )Ek. It is constructed as follows. Define 



1 



m + 1 



, 6 = 



m 



m~ - 1 



, <T = 2r. 




Fig. 5.1 A half-ellipsoid 



Then put 



y*+i = y k - , T 



(ayB t a 7 -)‘/2 



B / a / 



B*+i = 6 I B 



B*a_,-aJ B/ 

»/ B *ay 



(5.2) 



Theorem 1. The ellipsoid E k+i = E( y k +i, B^) defined as above is the ellipsoid of least 
volume containing (1/2 )E k . Moreover, 

vol(£ t+ i) _ / m 2 \ (m ~ 1>/2 m / 1 \ j 

vol(E^) \m 2 - 1 / m+ 1 6 ^ \ 2(m +1)/ 



Proof. We shall not prove the statement about the new ellipsoid being of least 
volume, since that is not necessary for the results that follow. To prove the remainder 
of the statement, we have 



vol(£fc+i) _ detqgg) 
vol(£*) deUB]/ 2 ) 

For simplicity, by a change of coordinates, we may take = I. Then B^+i has m- 1 
eigenvalues equal to S = and one eigenvalue equal to S-2Sr = ^-j-( l - = 

(^-) 2 . The reduction in volume is the product of the square roots of these, giving 
the equality in the theorem. 
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Then using (1 + x) p < e xp , we have 



2 \(w-l)/2 

m \ m 



m + 1 



= 1 + 



< exp 



(m-l)/2 



1 



1 



m + 1 



2(m +1) (m + 1) 



exp - 



1 



2(m+ 1) 



Convergence 

The ellipsoid method is initiated by selecting yo and R such that condition (Al) is 
satisfied. Then Bo = R 2 I, and the corresponding Eo contains Q. The updating of the 
Ek s is continued until a solution is found. 

Under the assumptions stated above, a single repetition of the ellipsoid method 
reduces the volume of an ellipsoid to one-half of its initial value in 0(m) iterations. 
(See Appendix A for O notation.) Hence it can reduce the volume to less than that 
of a sphere of radius r in 0(m 2 log(7?/r)) iterations, since its volume is bounded 
from below by vol^ (0, l))/ 11 and the initial volume is vol(S (0, 1 ))R m . Generally 
a single iteration requires 0(m 2 ) arithmetic operations. Hence the entire process 
requires 0(m 4 lo g(R/r)) arithmetic operations. 3 



Ellipsoid Method for Usual Form of LP 



Now consider the linear program (where A is m x n) 



(P) 



maximize c r x 
subject to Ax < b, x > 0 



and its dual 



(D) 



minimize y r b 

subject to y r A > c r , y > 0. 



Note that both problems can be solved by finding a feasible point to inequalities 



- c T x + b r y < 0 
Ax < b 
-A r y < -c 



x, y > 0, 



(5.3) 



where both x and y are variables. Thus, the total number of arithmetic operations 
for solving a linear program is bounded by 0((m + n ) 4 1 og(R/r)). 



3 Assumption (A2) is sometimes too strong. It has been shown, however, that when the data consists 

of integers, it is possible to perturb the problem so that (A2) is satisfied and if the perturbed problem 
has a feasible solution, so does the original fl 
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5.4 The Analytic Center 



The new interior-point algorithms introduced by Karmarkar move by successive 
steps inside the feasible region. It is the interior of the feasible set rather than the ver- 
tices and edges that plays a dominant role in this type of algorithm. In fact, these 
algorithms purposely avoid the edges of the set, only eventually converging to one 
as a solution. 

Our study of these algorithms begins in the next section, but it is useful at this 
point to introduce a concept that definitely focuses on the interior of a set, termed 
the set’s analytic center. As the name implies, the center is away from the edge. 

In addition, the study of the analytic center introduces a special structure, termed 
a barrier or potential that is fundamental to interior-point methods. 

Consider a set S in a subset of X of E n defined by a group of inequalities as 

S = {x 6 X : gj(x) >0, y = l,2, . . . , m}, 

and assume that the functions gj are continuous. S has a nonempty interior S = 
{x e X : gj(x) > 0, all j}. Associated with this definition of the set is the potential 
function 

m 

<A(x) = logg/x) 

j= i 

defined on S. 

The analytic center of S is the vector (or set of vectors) that minimizes the 
potential; that is, the vector (or vectors) that solve 



log gj(x) :xeX, gfx) > 0 for each j j . 

Example 1 (A Cube). Consider the set S defined by x t > 0, (1 - x t ) > 0, for i = 
1,2, . . . , n. This is S = [0, l] n , the unit cube in E n . The analytic center can be found 
by differentiation to be x t = 1 /2, for all i. Hence, the analytic center is identical to 
what one would normally call the center of the unit cube. 

In general, the analytic center depends on how the set is defined — on the partic- 
ular inequalities used in the definition. For instance, the unit cube is also defined by 
the inequalities Xi >0, (1 - Xi) d > 0 with odd d > 1. In this case the solution is 
Xi = 1/(J + 1) for all i. For large d this point is near the inner corner of the unit cube. 

Also, the addition of redundant inequalities can change the location of the 
analytic center. For example, repeating a given inequality will change the center’s 
location. 

There are several sets associated with linear programs for which the analytic 
center is of particular interest. One such set is the feasible region itself. Another is 
the set of optimal solutions. There are also sets associated with dual and primal-dual 
formulations. All of these are related in important ways. 



min if/ix) = min j - 



7-1 
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Let us illustrate by considering the analytic center associated with a bounded 
polytope Q in E m represented by n(> m) linear inequalities; that is, 

n = {yeE m :c T - y T A > 0), 



where A e E mxn and c e E n are given and A has rank m. Denote the interior of Q by 
il = {y e E m : c T - y r A > 0}. 

The potential function for this set is 

n n 

<An(y) = - L ]og(c > ~ = - L lo§ s >’ c 5 - 4 ) 

7=1 7=1 

where s = c - A r y is a slack vector. Hence the potential function is the negative 
sum of the logarithms of the slack variables. 

The analytic center of Q is the interior point of Q that minimizes the poten- 
tial function. This point is denoted by y a and has the associated s a = c - A T y a . 
The pair ( y a , s a ) is uniquely defined, since the potential function is strictly convex 
(see Sect. 7.4) in the bounded convex set Q. 

Setting to zero the derivatives of if/(y) with respect to each y t gives 

f — = 0, for all i. 

jri c i - y a i 



which can be written 

V — = 0, for all i. 

“ Sj 

7=1 J 

Now define xj - l/sj for each j. We introduce the notation 
XOSE (XiSi, x 2 s 2 , x n s n ) T , 

which is component multiplication. Then the analytic center is defined by the 
conditions 



x o s = 1 
Ax = 0 
A r y + s = c. 

The analytic center can be defined when the interior is empty or equalities are 
present, such as 

Q = {y e E m : c T - y r A > 0, By = b). 

In this case the analytic center is chosen on the linear surface {y : By = b} to 
maximize the product of the slack variables s = c - A r y. Thus, in this context 
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the interior of Q refers to the interior of the positive orthant of slack variables: 
R n + = {s : s > 0}. This definition of interior depends only on the region of the slack 
variables. Even if there is only a single point in £2 with s = c - A r y for some y 
where By = b with s > 0, we still say that Q is not empty. 



5.5 The Central Path 



The concept underlying interior-point methods for linear programming is to use 
nonlinear programming techniques of analysis and methodology. The analysis is 
often based on differentiation of the functions defining the problem. Traditional 
linear programming does not require these techniques since the defining functions 
are linear. Duality in general nonlinear programs is typically manifested through 
Lagrange multipliers (which are called dual variables in linear programming). The 
analysis and algorithms of the remaining sections of the chapter use these nonlin- 
ear techniques. These techniques are discussed systematically in later chapters, so 
rather than treat them in detail at this point, these current sections provide only 
minimal detail in their application to linear programming. It is expected that most 
readers are already familiar with the basic method for minimizing a function by set- 
ting its derivative to zero, and for incorporating constraints by introducing Lagrange 
multipliers. These methods are discussed in detail in Chaps. 1 1-15. 

The computational algorithms of nonlinear programming are typically iterative 
in nature, often characterized as search algorithms. At any step with a given point, 
a direction for search is established and then a move in that direction is made to 
define the next point. There are many varieties of such search algorithms and they 
are systematically presented throughout the text. In this chapter, we use versions of 
Newton’s method as the search algorithm, but we postpone a detailed study of the 
method until later chapters. 

Not only have nonlinear methods improved linear programming, but interior- 
point methods for linear programming have been extended to provide new ap- 
proaches to nonlinear programming. This chapter is intended to show how this 
merger of linear and nonlinear programming produces elegant and effective methods. 
These ideas take an especially pleasing form when applied to linear programming. 
Study of them here, even without all the detailed analysis, should provide good 
intuitive background for the more general manifestations. 

Consider a primal linear program in standard form 

(LP) minimize c r x (5.5) 

subject to Ax = b, x ^ 0. 

We denote the feasible region of this program by T p . We assume that T p - {x : 
Ax = b, x > 0} is nonempty and the optimal solution set of the problem is bounded. 
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Associated with this problem, we define for p > 0 the barrier problem 

n 

(BP) minimize c r x - p ^ log Xj (5.6) 

7=1 

subject to Ax = b, x > 0. 

It is clear that p - 0 corresponds to the original problem (5.5). As p — > oo, the 
solution approaches the analytic center of the feasible region (when it is bounded), 
since the barrier term swamps out c r x in the objective. As p is varied continuously 
toward 0, there is a path x(p) defined by the solution to (BP). This path x(p) is 
termed the primal central path. As p —> 0 this path converges to the analytic center 
of the optimal face {x : c T x - z*, Ax = b, x > 0}, where z* is the optimal value of 
(LP). 

A strategy for solving (LP) is to solve (BP) for smaller and smaller values of p 
and thereby approach a solution to (LP). This is indeed the basic idea of interior- 
point methods. 

At any p > 0, under the assumptions that we have made for problem (5.5), the 
necessary and sufficient conditions for a unique and bounded solution are obtained 
by introducing a Lagrange multiplier vector y for the linear equality constraints to 
form the Lagrangian (see Chap. 11) 

n 

C T X-/1^ log Xj - y'(Ax - b). 

7=1 

The derivatives with respect to the x/s are set to zero, leading to the conditions 
cj - p/xj - y r a j = 0, for each j 



or equivalently 



pX 1 1 + A r y = c 



(5.7) 



where as before a j is the jth column of A, 1 is the vector of 1 ’s, and X is the diagonal 
matrix whose diagonal entries are the components of x > 0. Setting Sj = p/xj the 
complete set of conditions can be rewritten 



x o s = pi 

Ax = b (5.8) 

A r y + s = c. 

Note that y is a dual feasible solution and c - A r y > 0 (see Exercise 4). 

Example 2 (A Square Primal ). Consider the problem of maximizing x\ within the 
unit square S = [0, l] 2 . The problem is formulated as 
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min -x\ 

S.t. X\ + X 3 = 1 

X2 + X4 = 1 

Xi ^ 0, X2 > 0, X 3 > 0, X4 > 0. 



Here *3 and X 4 are slack variables for the original problem to put it in standard 
form. The optimality conditions for x(ju) consist of the original two linear constraint 
equations and the four equations 



yi + si = -1, y 2 + s 2 = 0, yi + s 3 = 0, y 2 + s 4 = 0 

together with the relations s t = p/xt for i = 1, 2 . . . , 4. These equations are readily 
solved with a series of elementary variable eliminations to find 

1 - 2yU ± V 1 + V 
x\(M) = 2 

x 2 {ji) = 1/2. 

Using the “+” solution, it is seen that as p — > 0 the solution goes to x — > (1, 1/2). 
Note that this solution is not a corner of the cube. Instead it is at the analytic center 
of the optimal face {x : x\ = 1, 0 < X 2 < 1}. See Fig. 5.2. The limit of x(ji) as 
yU ^ 00 can be seen to be the point (1/2, 1/2). Hence, the central path in this case is 
a straight line progressing from the analytic center of the square (at ji —> 00) to the 
analytic center of the optimal face (at fi —> 0 ). 



Dual Central Path 



Now consider the dual problem 

(LD) maximize y r b 

subject to y r A + s r = c r , s > 0. 

We may apply the barrier approach to this problem by formulating the problem 

n 

(BD) maximize y r b + /i ^ log Sj 

j= 1 

subject to y r A + s r = c r , s > 0. 

We assume that the dual feasible set Td has an interior Td = {(y, s) : y r A + s r = 
c r , s > 0} is nonempty and the optimal solution set of (LD) is bounded. Then, as // 
is varied continuously toward 0 , there is a path (y(yu), s (jj)) defined by the solution 
to (BD). This path is termed the dual central path. 
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x 2 



I 



o 



Fig. 5.2 The analytic path for the square 



To work out the necessary and sufficient conditions we introduce x as a Lagrange 
multiplier and form the Lagrangian 

y r b + n ^ log Sj - (y r A + s r - c r )x. 

J = i 

Setting to zero the derivative with respect to y t leads to 

bi - a l x = 0, for all i 

where a* is the ith row of A. Setting to zero the derivative with respect to sj leads to 
Hi Sj - Xj = 0, or 1 - XjSj = 0, for all j. 

Combining these equations and including the original constraint yields the complete 
set of conditions which are identical to the optimality conditions for the primal 
central path (5.8). Note that x is indeed a primal feasible solution and x > 0 . 

To see the geometric representation of the dual central path, consider the dual 
level set 

Q(z) = {y :c T - y r A > 0, y T b > z } 

for any z < z* where z* is the optimal value of (LD). Then, the analytic center 
(y(z), s(z)) of D(z) coincides with the dual central path as z tends to the optimal 
value z* from below. This is illustrated in Fig. 5.3, where the feasible region of the 
dual set (not the primal) is shown. The level sets Q(z) are shown for various values 
of z. The analytic centers of these level sets correspond to the dual central path. 

Example 3 ( The Square Dual). Consider the dual of Example 2. This is 

max yi + y 2 
subjectto yi < -1 

y2 ^ 0. 



(The values of s\ and ^2 are the slack variables of the inequalities.) The solution 
to the dual barrier problem is easily found from the solution of the primal barrier 
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problem to be 



yi(p.) = -1 -fi/x yi = -2)i. 




Fig. 5.3 The central path as analytic centers in the dual feasible region 



As p —> 0, we have yi — > —1, 3^2 — ^ 0, which is the unique solution to the dual LP. 
However, as p — > 00, the vector y is unbounded, for in this case the dual feasible set 
is itself unbounded. 



Primal-Dual Central Path 

Suppose the feasible region of the primal (LP) has interior points and its optimal 
solution set is bounded. Then, the dual also has interior points (see Exercise 4). The 
primal-dual path is defined to be the set of vectors (x(p) > 0 , y (p), s (p) > 0 ) that 
satisfy the conditions 



X O S = yUl 

Ax = b 
A r y + s = c 

for 0 < p < 00. Hence the central path is defined without explicit reference to 
an optimization problem. It is simply defined in terms of the set of equality and 
inequality conditions. 

Since conditions (5.8) and (5.9) are identical, the primal-dual central path can be 
split into two components by projecting onto the relevant space, as described in the 
following proposition. 

Proposition 1. Suppose the feasible sets of the primal and dual programs contain interior 
points. Then the primal-dual central path (x(p), y (p), s(p)) exists for all p, 0 < p < 00. 
Furthermore, x(p) is the primal central path, and (y(p), s(p)) is the dual central path. More- 



130 



5 Interior-Point Methods 



over, x(p) and (y (ji), s (p)) converge to the analytic centers of the optimal primal solution 
and dual solution faces, respectively, as p ^ 0. 



Duality Gap 

Let (x(p), y(p), s(p)) be on the primal-dual central path. Then from (5.9) it follows 
that 

c r x - y r b = y r Ax + s r x - y r b = s r x = np. 

The value c r x - y r b = s r x is the difference between the primal objective value 
and the dual objective value. This value is always nonnegative (see the weak duality 
lemma in Sect. 4.2) and is termed the duality gap. At any point on the primal-dual 
central path, the duality gap is equal to np. It is clear that as p —> 0 the duality 
gap goes to zero, and hence both x(p) and (y (p), s(p)) approach optimality for the 
primal and dual, respectively. 

The duality gap provides a measure of closeness to optimality. For any primal 
feasible x, the value c T x gives an upper bound as c r x ^ z* where z* is the optimal 
value of the primal. Likewise, for any dual feasible pair (y, s), the value y r b gives 
a lower bound as y r b < z*. The difference, the duality gap g = c T x - y r b, provides 
a bound on z* as z* ^ c T x - g. Hence if at a feasible point x, a dual feasible (y, s) 
is available, the quality of x can be measured as c r x - z* < g. 



5.6 Solution Strategies 

The various definitions of the central path directly suggest corresponding strate- 
gies for solution of a linear program. We outline three general approaches here: 
the primal barrier or path-following method, the primal-dual path-following method 
and the primal-dual potential-reduction method, although the details of their im- 
plementation and analysis must be deferred to later chapters after study of general 
nonlinear methods. Table 5.1 depicts these solution strategies and the simplex meth- 
ods described in Chaps. 3 and 4 with respect to how they meet the three optimality 
conditions: Primal Feasibility, Dual Feasibility, and Zero-Duality during the itera- 
tive process. 



Table 5.1 Properties of algorithms 





P-F 


D-F 


0-Duality 


Primal simplex 


V 




V 


Dual simplex 
Primal barrier 


v 


V 


V 


Primal-dual path-following 


V 


V 




Primal-dual potential-reduction 


V 


V 
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For example, the primal simplex method keeps improving a primal feasible so- 
lution, maintains the zero-duality gap (complementarity slackness condition) and 
moves toward dual feasibility; while the dual simplex method keeps improving a 
dual feasible solution, maintains the zero-duality gap (complementarity condition) 
and moves toward primal feasibility (see Sect. 4.3). The primal barrier method keeps 
improving a primal feasible solution and moves toward dual feasibility and comple- 
mentarity; and the primal-dual interior-point methods keep improving a primal and 
dual feasible solution pair and move toward complementarity. 



Primal Barrier Method 

A direct approach is to use the barrier construction and solve the the problem 

minimize c r x - // ^ - _ i log xj (5.9) 

subject to Ax = b, x > 0, 

for a very small value of /i. In fact, if we desire to reduce the duality gap to £ it is 
only necessary to solve the problem for yu = s/n. Unfortunately, when fi is small, 
the problem (5.9) could be highly ill-conditioned in the sense that the necessary 
conditions are nearly singular. This makes it difficult to directly solve the problem 
for small yu. 

An overall strategy, therefore, is to start with a moderately large yu (say yu = 
100) and solve that problem approximately. The corresponding solution is a point 
approximately on the primal central path, but it is likely to be quite distant from the 
point corresponding to the limit of // — > 0. However this solution point at yu = 100 
can be used as the starting point for the problem with a slightly smaller yu, for this 
point is likely to be close to the solution of the new problem. The value of yu might 
be reduced at each stage by a specific factor, giving Hk + 1 = yt*k, where y is a fixed 
positive parameter less than one and k is the stage count. 

If the strategy is begun with a value yu o, then at the kth stage we have - yVo- 
Hence to reduce yu^/yuo to below s, requires 

l0g£ 

logy 



stages. 

Often a version of Newton’s method for minimization is used to solve each of the 
problems. For the current strategy, Newton’s method works on problem (5.9) with 
fixed yu by considering the central path equations (5.8) 



X O S = yUl 

Ax = b 



A r y + s = c. 



(5.10) 
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From a given point x e T p , Newton’s method moves to a closer point x + e T p 
by moving in the directions d x , d y and d s determined from the linearized version 
of (5. 10) 



yuX _2 d x + d s = yuX -1 1 — c, 

Ad x = 0, (5.11) 

-A T d y - d s = 0. 



(Recall that X is the diagonal matrix whose diagonal entries are components of 
x > 0.) The new point is then updated by taking a step in the direction of d x , as 
x + = x + d x . 

Notice that if x o s = fil for some s = c - A r y, then d = (d x , d y , d s ) = 0 because 
the current point satisfies Ax = b and hence is already the central path solution 
for jj. If some component of x o s is less than //, then d will tend to increment the 
solution so as to increase that component. The converse will occur for components 
of x o s greater than jj. 

This process may be repeated several times until a point close enough to the 
proper solution to the barrier problem for the given value of fi is obtained. That is, 
until the necessary and sufficient conditions (5.7) are (approximately) satisfied. 

There are several details involved in a complete implementation and analysis of 
Newton’s method. These items are discussed in later chapters of the text. However, 
the method works well if either // is moderately large, or if the algorithm is initi- 
ated at a point very close to the solution, exactly as needed for the barrier strategy 
discussed in this subsection. 

To solve (5.1 1), premultiplying both sides by X 2 we have 
judx + X 2 d s = juXl - X 2 c. 

Then, premultiplying by A and using Ad x = 0, we have 

AX 2 d s = yuAXl - AX 2 c. 



Using d s = -A r d y we have 

(AX 2 A r )d y = -juAXl + AX 2 c. 

Thus, d y can be computed by solving the above linear system of equations. Then d s 
can be found from the third equation in (5. 1 1) and finally d x can be found from the 
first equation in (5.1 1), together this amounts to 0(nm 2 + m 3 ) arithmetic operations 
for each Newton step. 
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Primal-Dual Path-Following 



Another strategy for solving a linear program is to follow the central path from a 
given initial primal-dual solution pair. Consider a linear program in standard form 

Primal Dual 

minimize c r x maximize y r b 

subject to Ax = b, x ^ 0 subject to y r A < c T . 

Assume that the interior of both primal and dual feasible regions T ± 0; that is, 
both 4 

T p = {x : Ax = b, x > 0} ± 0 and Td = {(y, s) : s = c - A r y > 0} ^ 0; 

and denote by z* the optimal objective value. 

The central path can be expressed as 

C = j(x, y, s)ef :xos = 

in the primal-dual form. On the path we have x o s = //I and hence s T x = njd. 
A neighborhood of the central path C is of the form 

N(ti) = {(x, y, s) e T : |s o x - jil\ < rj/i , where yu = s T x/n} (5.12) 

for some rj e (0, 1), say rj = 1/4. This can be thought of as a tube whose center is 
the central path. 

The idea of the path-following method is to move within a tubular neighborhood 
of the central path toward the solution point. A suitable initial point (x°, y°, s°) e 
N(rj) can be found by solving the barrier problem for some fixed //o or from an ini- 
tialization phase proposed later. After that, step by step moves are made, alternating 
between a predictor step and a corrector step. After each pair of steps, the point 
achieved is again in the fixed given neighborhood of the central path, but closer to 
the linear program’s solution set. 

The predictor step is designed to move essentially parallel to the true central 
path. The step d = (d x , d y , d s ) is determined from the linearized version of the 
primal-dual central path equations of (5.9), as 

s o d x + x o d s = y/i 1 - x o s, 

Ad x = 0, (5.13) 

-A r d y - d s = 0, 



where here one selects y - 0. (To show the dependence of d on the current pair 
(x, s) and the parameter y, we write d = d(x, s, y).) 



4 The symbol 0 denotes the empty set. 
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The new point is then found by taking a step in the direction of d, as (x + , y + , s + ) = 
(x, y, s) + a(d x , d y , d s ), where a is the step-size. Note that djd s = -d£ A r d y = 0 
here. Then 

(x + )V = (x + ad x ) T (s + ord s ) = x r s + a( d* s + x r d s ) = (1 - or)x r s, 

where the last step follows by multiplying the first equation in (5.13) by 1 T . Thus, 
the predictor step reduces the duality gap by a factor l - a. The maximum possible 
step- size a in that direction is made in that parallel direction without going outside 
of the neighborhood N(2rj). 

The corrector step essentially moves perpendicular to the central path in order to 
get closer to it. This step moves the solution back to within the neighborhood N(rj), 
and the step is determined by selecting y — 1 in (5.13) with p = x T s/n. Notice that 
if x o s = pi, then d = 0 because the current point is already a central path solution. 

This corrector step is identical to one step of the barrier method. Note, however, 
that the predictor-corrector method requires only one sequence of steps, each con- 
sisting of a single predictor and corrector. This contrasts with the barrier method 
which requires a complete sequence for each p to get back to the central path, and 
then an outer sequence to reduce the yu’s. 

One can prove that for any (x, y, s) e N(rf) with p = x T s/n , the step-size in the 
predictor stop satisfies 



1 




Thus, the iteration complexity of the method is 0( sjn) log(l/e)) to achieve p/p o < s 
where np o is the initial duality gap. Moreover, one can prove that the step- size a — > 1 
as x T s — > 0, that is, the duality reduction speed is accelerated as the gap becomes 
smaller. 



Primal-Dual Potential Reduction Algorithm 

In this method a primal- dual potential function is used to measure the solution’s 
progress. The potential is reduced at each iteration. There is no restriction on either 
neighborhood or step-size during the iterative process as long as the potential is 
reduced. The greater the reduction of the potential function, the faster the conver- 
gence of the algorithm. Thus, from a practical point of view, potential-reduction 
algorithms may have an advantage over path-following algorithms where iterates 
are confined to lie in certain neighborhoods of the central path. 

For x^Tp and (y, s) e Td the primal-dual potential function is defined by 

n 

^n+p(x, S ) = (n+ p) log(x r s) - 'Yj log (XjSj), (5.14) 

j= i 



where p >0. 
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From the arithmetic and geometric mean inequality (also see Exercise 10) we 
can derive that 

n 

n log(x r s) - ^ lo g(xjSj) ^ n log n. 
j= i 



Then 



n 

i{/ n+p (x, s) = p log(x r s) + n log(x^s) - ^ log(XjSj) >p\og(x T s) + n logn. (5.15) 

7=1 



Thus, for p > 0, i// n+p (x , s) — > -oo implies that x r s — > 0. More precisely, we have 
from (5.15) 



T ( ^An+p(x, s) — n log n 

x s < exp 1 



Hence the primal-dual potential function gives an explicit bound on the magnitude 
of the duality gap. 

The objective of this method is to drive the potential function down toward minus 
infinity. The method of reduction is a version of Newton’s method (5.13). In this 
case we select y = nl(n + p) in (5.13). Notice that is a combination of a predictor 
and corrector choice. The predictor uses y - 0 and the corrector uses y = 1. The 
primal-dual potential method uses something in between. This seems logical, for 
the predictor moves parallel to the central path toward a lower duality gap, and the 
corrector moves perpendicular to get close to the central path. This new method 
does both at once. Of course, this intuitive notion must be made precise. 

For p ^ ^fn , there is in fact a guaranteed decrease in the potential function by a 
fixed amount S (see Exercises 12 and 13). Specifically, 



l//n+p(x + , S + ) - iffn+pix, S) < S (5.16) 

for a constant S > 0.2. This result provides a theoretical bound on the number of 
required iterations and the bound is competitive with other methods. However, a 
faster algorithm may be achieved by conducting a line search along direction d to 
achieve the greatest reduction in the primal-dual potential function at each iteration. 
We outline the algorithm here: 

Step 1. Start at a point (x 0 , y 0 , s 0 ) e f with ^ +p (x 0 , s 0 ) < plog((s 0 ) r x 0 ) + 
n log n+0( ^fn log n) which is determined by an initiation procedure, as discussed 
in Sect. 5.7. Set p > yfn. Set k = 0 and y = nl(n + p). Select an accuracy 
parameter s > 0. 

Step 2. Set (x, s) = (x&, SjO and compute (d x , d y , d s ) from (5.13). 

Step 3. Let x^+i = x^ + fid x , y^+i = + fid y , and s^+i = + ad s where 

a = argmin^„ +p (x£ + ad x , s k + ad s ). 

a> 0 

Step 4. Let k = k + 1. If -^r— < £, Stop. Otherwise return to Step 2. 
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Theorem 2. The algorithm above terminates in at most 0(plog(n/ s)) iterations with 

(s k) T *k 
(s 0 ) r x 0 ” E ' 

Proof. Note that after k iterations, we have from (5.16) 

ikn+p(xk, S lit) < ^+p(x 0 , So) - k • 8 < P log((s 0 ) r x 0 ) + n log n + 0( log n) - k • 5. 
Thus, from the inequality (5.15), 

p log(s^x^) + n log n < p log(So xo) + n log n + 0(yfn log ri)-k - 6, 
or 

p(log(s[x*) - log(SoXo)) < -k • 6 + O(^logn). 

Therefore, as soon as k > 0(p lo g(n/s)), we must have 

p(log(s[x /c ) - log(s { ' x 0 )) < -plog(l/e). 



or 



s k x k 

sjxo 



< s\ 



Theorem 2 holds for any p > yfn. Thus, by choosing p 
complexity bound becomes 0( login /s)). 



the iteration 



Iteration Complexity 

The computation of each iteration basically requires solving (5.13) for d. Note that 
the first equation of (5.13) can be written as 

Sd x + Xd s = ypl - XS1 

where X and S are two diagonal matrices whose diagonal entries are components of 
x > 0 and s > 0, respectively. Premultiplying both sides by S -1 we have 

d x + S -1 Xd s = ypS~ l l - x. 

Then, premultiplying by A and using Ad x = 0, we have 

AS _1 Xd s = ypAS" 1 ! - Ax = ypAS~ l l - b. 



Using d s = -A r d y we have 

(AS -1 XA r )d y = b - yp AS' 1 !. 
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Thus, the primary computational cost of each iteration of the interior-point algorithm 
discussed in this section is to form and invert the normal matrix AXS -1 A r , which 
typically requires 0(nm 2 + m 3 ) arithmetic operations. However, an approximation 
of this matrix can be updated and inverted using far fewer arithmetic operations. In 
fact, using a rank-one technique (see Chap. 10) to update the approximate inverse of 
the normal matrix during the iterative progress, one can reduce the average number 
of arithmetic operations per iteration to 0( yjn m 2 ). Thus, if the relative tolerance s 
is viewed as a variable, we have the following total arithmetic operation complexity 
bound to solve a linear program: 

Corollary. Let p = sfn. Then, the algorithm above Theorem 2 terminates in at most 

0(nm 2 login /s)) arithmetic operations. 



5.7 Termination and Initialization 

There are several remaining important issues concerning interior-point algorithms 
for linear programs. The first issue involves termination. Unlike the simplex method 
which terminates with an exact solution, interior-point algorithms are continuous 
optimization algorithms that generate an infinite solution sequence converging to an 
optimal solution. If the data of a particular problem are integral or rational, an argu- 
ment is made that, after the worst-case time bound, an exact solution can be rounded 
from the latest approximate solution. Several questions arise. First, under the real 
number computation model (that is, the data consists of real numbers), how can 
we terminate at an exact solution? Second, regardless of the data’s status, is there a 
practical test, which can be computed cost-effectively during the iterative process, to 
identify an exact solution so that the algorithm can be terminated before the worse- 
case time bound? Here, by exact solution we mean one that could be found using 
exact arithmetic, such as the solution of a system of linear equations, which can be 
computed in a number of arithmetic operations bounded by a polynomial in n. 

The second issue involves initialization. Almost all interior-point algorithms 
require the regularity assumption that T ± 0. What is to be done if this is not true? 
A related issue is that interior-point algorithms have to start at a strictly feasible 
point near the central path. 



* Termination 



Complexity bounds for interior-point algorithms generally depend on an s which 
must be zero in order to obtain an exact optimal solution. Sometimes it is advanta- 
geous to employ an early termination or rounding method while s is still moderately 
large. There are five basic approaches. 
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• A “purification” procedure finds a feasible corner whose objective value is at 
least as good as the current interior point. This can be accomplished in strongly 
polynomial time (that is, the complexity bound is a polynomial only in the 
dimensions m and n). One difficulty is that there may be many non-optimal ver- 
tices close to the optimal face, and the procedure might require many pivot steps 
for difficult problems. 

• A second method seeks to identify an optimal basis. It has been shown that if 
the linear program is nondegenerate, the unique optimal basis may be identified 
early. The procedure seems to work well for some problems but it has diffi- 
culty if the problem is degenerate. Unfortunately, most real linear programs are 
degenerate. 

• The third approach is to slightly perturb the data such that the new program 
is nondegenerate and its optimal basis remains one of the optimal bases of the 
original program. There are questions about how and when to perturb the data 
during the iterative process, decisions which can significantly affect the success 
of the effort. 

• The fourth approach is to guess the optimal face and find a feasible solution on 
that face. It consists of two phases: the first phase uses interior point algorithms to 
identify the complementarity partition (P* 9 Z*) (see Exercise 6), and the second 
phase adapts the simplex method to find an optimal primal (or dual) basic solu- 
tion and one can use (P\ Z*) as a starting base for the second phase. This method 
is often called the cross-over method. It is guaranteed to work in finite time and 
is implemented in several popular linear programming software packages. 

• The fifth approach is to guess the optimal face and project the current interior 
point onto the interior of the optimal face. See Fig. 5.4. The termination criterion 
is guaranteed to work in finite time. 

The fourth and fifth methods above are based on the fact that (as observed in practice 
and subsequently proved) many interior-point algorithms for linear programming 
generate solution sequences that converge to a strictly complementary solution or 
an interior solution on the optimal face; see Exercise 8. 



Initialization 

Most interior-point algorithms must be initiated at a strictly feasible point. The 
complexity of obtaining such an initial point is the same as that of solving the 
linear program itself. More importantly, a complete algorithm should accomplish 
two tasks: (1) detect the infeasibility or unboundedness status of the problem, then 
(2) generate an optimal solution if the problem is neither infeasible nor unbounded. 
Several approaches have been proposed to accomplish these goals: 

• The primal and dual can be combined into a single linear feasibility problem, 
and a feasible point found. Theoretically, this approach achieves the currently 
best iteration complexity bound, that is, 0( \ fn log(l/£)). Practically, a significant 
disadvantage of this approach is the doubled dimension of the system of equations 
that must be solved at each iteration. 
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• The bi g-M method can be used by adding one or more artificial column(s) and/or 
row(s) and a huge penalty parameter M to force solutions to become feasible 
during the algorithm. A major disadvantage of this approach is the numerical 
problems caused by the addition of coefficients of large magnitude. 

• Phase I-then-Phase II methods are effective. A major disadvantage of this 
approach is that the two (or three) related linear programs must be solved 
sequentially. 




Fig. 5.4 Illustration of the projection of an interior point onto the optimal face 



• A modified Phase I-Phase II method approaches feasibility and optimality si- 
multaneously. To our knowledge, the currently best iteration complexity bound 
of this approach is 0(nlog(l/s)), as compared to 0(sfn\og{ \ I s)) of the three 
above. Other disadvantages of the method include the assumption of non-empty 
interior and the need of an objective lower bound. 



The HSD Algorithm 

There is an algorithm, termed the Homogeneous Self-Dual Algorithm that over- 
comes the difficulties mentioned above. The algorithm achieves the theoretically 
best 0(sJn\og{\ I £)) complexity bound and is often used in linear programming 
software packages. 

The algorithm is based on the construction of a homogeneous and self-dual linear 
program related to (LP) and (LD) (see Sect. 5.5). We now briefly explain the two 
major concepts, homogeneity and self-duality, used in the construction. 

In general, a system of linear equations of inequalities is homogeneous if the right 
hand side components are all zero. Then if a solution is found, any positive mul- 
tiple of that solution is also a solution. In the construction used below, we allow a 
single inhomogeneous constraint, often called a normalizing constraint. Karmarkar’s 
original canonical form is a homogeneous linear program. 
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A linear program is termed self-dual if the dual of the problem is equivalent to 
the primal. The advantage of self-duality is that we can apply a primal-dual interior- 
point algorithm to solve the self-dual problem without doubling the dimension of 
the linear system solved at each iteration. 

The homogeneous and self-dual linear program (HSDP) is constructed from (LP) 
and (LD) in such a way that the point x = l, y = 0, r=l,z=l,0=lis feasible. 
The primal program is 

(HSDP) minimize 
subject to 

-A r y 
b r y 
-Fy 
y free, 

where 

b = b - Al, c = c - 1, z = c r l + 1. (5.17) 

Notice that b, c, and z represent the “infeasibility” of the initial primal point, dual 
point, and primal-dual “gap,” respectively. They are chosen so that the system is 
feasible. For example, for the point x = 1, y = 0, r = 1, 6 = 1, the last equation 
becomes 

0 + c r x - l r x - (c r x + 1) = -n - 1. 

Note also that the top two constraints in (HSDP), with r = 1 and 6 = 0, represent 
primal and dual feasibility (with x > 0). The third equation represents reversed 
weak duality (with b r y > c r x) rather than the reverse. So if these three equations 
are satisfied with r = 1 and 6 = 0 they define primal and dual optimal solutions. 
Then, to achieve primal and dual feasibility for x = 1, (y, s) = (0, 1), we add the 
artificial variable 6. The fourth constraint is added to achieve self-duality. 

The problem is self-dual because its overall coefficient matrix has the property 
that its transpose is equal to its negative. It is skew -symmetric. 

Denote by s the slack vector for the second constraint and by k the slack scalar for 
the third constraint. Denote by Fh the set of all points (y, x, r, 6, s, k) that are feasi- 
ble for (HSDP). Denote by F® the set of strictly feasible points with (x, r, s, a:) > 0 
in Fh- By combining the constraints (Exercise 14) we can write the last (equality) 
constraint as 

l T x + l T s + T + K-(n+ 1)0 = (n+ 1), (5.18) 

which serves as a normalizing constraint for (HSDP). This implies that for 0 < 6 < 1 
the variables in this equation are bounded. 

We state without proof the following basic result. 

Theorem 1. Consider problems (HSDP). 

(i) (HSDP) has an optimal solution and its optimal solution set is bounded. 

(ii) The optimal value of (HSDP) is zero, and 



(n + 1)6 



Ax -br 


+b 6 = 


+CT 


-c 6 > 


T 

-C X 


+16 > 


+C 7 x -ZT 


= 


X > 0, T > 0, 


6 free; 
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(y, x, r, 6, s, K)eT h implies that (n + 1)6 = x T s + tk. 
(iii) There is an optimal solution (y*, x*, r*, 6* = 0, s*, k *) e 7^ such that 



which we call a strictly self-complementary solution. 

Part (ii) of the theorem shows that as 6 goes to zero, the solution tends toward 
satisfying complementary slackness between x and s and between r and k. Part (iii) 
shows that at a solution with 6 = 0, the complementary slackness is strict in the sense 
that at least one member of a complementary pair must be positive. For example, 
xi^i =0 is required by complementary slackness, but in this case x\ = 0, s\ = 0 
will not occur; exactly one of them must be positive. 

We now relate optimal solutions to (HSDP) to those for (LP) and (LD). 

Theorem 2. Let (y*, x*, r*, 6* = 0, s*, k*) be a strictly -self complementary solution for 

(HSDP). 

(i) (LP) has a solution (feasible and bounded) if and only ifr* > 0. In this case, x*/r* is 
an optimal solution for (LP) and y */t*, s */t* is an optimal solution for (LD). 

(ii) (LP) has no solution if and only if k* > 0. In this case, x* /k* or y*/K*or both are 
certificates for proving infeasibility: if c T x* < 0 then (LD) is infeasible; if- b r y* < 0 
then (LP) is infeasible; and if both c T x* <0 and -b r y* < 0 then both (LP) and (LD) 
are infeasible. 



Proof. We prove the second statement. We first assume that one of (LP) and (LD) 
is infeasible, say (LD) is infeasible. Then there is some certificate x > 0 such that 
Ax = 0 and c 7 x = - 1 . Let (y = 0, s = 0) and 



n + 1 

l r x + l T s + 1 



> 0 . 



Then one can verify that 

y* = ay, x* = ax, f* = 0, 6* = 0, s* = crs, k* = a 

is a self-complementary solution for (HSDP). Since the supporting set (the set of 
positive entries) of a strictly complementary solution for (HSDP) is unique (see 
Exercise 6), k* > 0 at any strictly complementary solution for (HSDP). 

Conversely, if r* = 0, then k* > 0, which implies that c T x* - b r y* < 0, i.e., 
at least one of c T x* and -b r y* is strictly less than zero. Let us say c r x* < 0. In 
addition, we have 

Ax* = 0, A r y* + s* = 0, (x*) r s* = 0 and x* + s* > 0. 

From Farkas’ lemma (Exercise 5), x*/k* is a certificate for proving dual infeasibility. 
The other cases hold similarly. I 
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To solve (HSDP), we have the following theorem that resembles the the central 
path analyzed for (LP) and (LD). 

Theorem 3. Consider problem (HSDP). For any p > 0, there is a unique (y, x, t, 6, s, k ) 
in fh, such that 

Moreover, (x, r) = (1, 1), (y, s, k ) = (0, 0, 1) and 6 = 1 is the solution with p = 1. 

Theorem 3 defines an endogenous path associated with (HSDP): 



C = \ (y, X, r, 6 , s, K) e T ( J 



X o s 
TK 



X T S + TK i 

n+ 1 



Furthermore, the potential function for (HSDP) can be defined as 



n 

i/Vn+p(x, T, s, x) = (n+\ + p) log(x r s + tk) - ^ log(x ; .y ; ) - log(re), (5.19) 

7=1 

where p > 0. One can then apply the interior-point algorithms described earlier to 
solve (HSDP) from the initial point (x, r) = (1, 1), (y, s, a:) = (0, 1, 1) and 0=1 
with p = (x r s + Tic)/(n + 1) = 1. 

The HSDP method outlined above enjoys the following properties: 

• It does not require regularity assumptions concerning the existence of optimal, 
feasible, or interior feasible solutions. 

• It can be initiated at x = 1, y = 0 and s = 1, feasible or infeasible, on the 
central ray of the positive orthant (cone), and it does not require a bi g-M penalty 
parameter or lower bound. 

• Each iteration solves a system of linear equations whose dimension is almost the 
same as that used in the standard (primal-dual) interior-point algorithms. 

• If the linear program has a solution, the algorithm generates a sequence that 
approaches feasibility and optimality simultaneously; if the problem is infeasible 
or unbounded, the algorithm produces an infeasibility certificate for at least one 
of the primal and dual problems; see Exercise 5. 



5.8 Summary 

The simplex method has for decades been an efficient method for solving linear pro- 
grams, despite the fact that there are no theoretical results to support its efficiency. 
Indeed, it was shown that in the worst case, the method may visit every vertex of 
the feasible region and this can be exponential in the number of variables and con- 
straints. If on practical problems the simplex method behaved according to the worst 
case, even modest problems would require years of computer time to solve. The 
ellipsoid method was the first method that was proved to converge in time propor- 
tional to a polynomial in the size of the program, rather than to an exponential in the 
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size. However, in practice, it was disappointingly less fast than the simplex method. 
Later, the interior-point method of Karmarkar significantly advanced the field of lin- 
ear programming, for it not only was proved to be a polynomial-time method, but it 
was found in practice to be faster than the simplex method when applied to general 
linear programs. 

The interior-point method is based on introducing a logarithmic barrier function 
with a weighting parameter yu; and now there is a general theoretical structure defin- 
ing the analytic center, the central path of solutions as yu — > 0, and the duals of these 
concepts. This structure is useful for specifying and analyzing various versions of 
interior point methods. 

Most methods employ a step of Newton’s method to find a point near the central 
path when moving from one value of // to another. One approach is the predictor- 
corrector method, which first takes a step in the direction of decreasing yu and then a 
corrector step to get closer to the central path. Another method employs a potential 
function whose value can be decreased at each step, which guarantees convergence 
and assures that intermediate points simultaneously make progress toward the solu- 
tion while remaining close to the central path. 

Complete algorithms based on these approaches require a number of other fea- 
tures and details. For example, once systematic movement toward the solution is 
terminated, a final phase may move to a nearby vertex or to a non-vertex point on 
a face of the constraint set. Also, an initial phase must be employed to obtain an 
feasible point that is close to the central path from which the steps of the search 
algorithm can be started. These features are incorporated into several commercial 
software packages, and generally they perform well, able to solve very large linear 
programs in reasonable time. 



5.9 Exercises 

1. Using the simplex method, solve the program (5.1) and count the number of 
pivots required. 

2. Prove the volume reduction rate in Theorem 1 for the ellipsoid method. 

3. Develop a cutting plane method, based on the ellipsoid method, to find a point 
satisfying convex inequalities 

fi(x) < 0, i = 1, ..., m, |x| 2 < E 2 , 

where ft s are convex functions of x in C 1 . 

4. Consider the linear program (5.5) and assume that T p = {x : Ax = b, x > 0} 
is nonempty and its optimal solution set is bounded. Show that the dual of the 
problem has a nonempty interior. 

5. (Farkas’ lemma) Prove: Exactly one of the feasible sets (x : Ax = b, x > 0} 
and {y : y r A < 0, y r b = 1} is nonempty. A vector y in the latter set is called an 
infeasibility certificate for the former. 
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6 . (Strict complementarity) Consider any linear program in standard form and its 
dual and let both of them be feasible. Then, there always exists a strictly com- 
plementary solution pair, (x\ y\ s*), such that 

x*s* = 0 and x* + s* > 0 for all j. 

j j j j J 

Moreover, the supports of x* and s*, P* = {j : x* > 0} and Z* = {j : x* > 0}, 
are invariant among all strictly complementary solution pairs. 

7. (Central path theorem) Let (x(ju), y (ji), s (//)) be the central path of (5.9). Then 
prove 

(a) The central path point ( x(ji ), y (//), s (ju)) is bounded for 0 < yu < jj° and any 
given 0 < yu° < cx). 

(b) For 0 < jd' < jd, 

c r x(j u) < c r x(ju) and b r y (ju') > b r y(ju). 

Furthermore, if x(ju') ^ x(yu) and y (//) ^ y(yu), 

^xOu') < c r x(ju) and b r y(ju') > b r y(ju). 

(c) (x(yu), y(ju), s(//)) converges to an optimal solution pair for (LP) and (LD). 
Moreover, the limit point x(0)/>* is the analytic center on the primal optimal 
face, and the limit point s(0)z* is the analytic center on the dual optimal 
face, where (P*, Z*) is the strict complementarity partition of the index set 
{ 1 , 2 ,..., n}. 

8 . Consider a primal-dual interior point (x, y, s) e N(rj) where 77 < 1. Prove that 
there is a fixed quantity S > 0 such that 

Xj > <5, for all j e P* 

and 

sj > d, for all j e Z*, 

where (P*, Z*) is defined in Exercise 6 . 

9. (Potential level theorem) Define the potential level set 

¥(<5) := {(x, y, s) € t : ip„ +p (x, s) < d}. 



Prove 

(a) 

ifd 1 <d 2 . 



(b) For every S , 'F(d) is bounded and its closure 'F(d) has non-empty intersec- 
tion with the solution set. 
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10. Given 0 < x, 0 < s e E n , show that 

n 

n log(x r s) - ^ log(x jSj) > n log n 

j= i 



and 



il/ n+p (x, s) — n log n 



x s < exp 

11. (Logarithmic approximation) If d e E n such that |d|oo < 1 then 

|d | 2 



l r d log(l +</,-)> l r d - 



2(1 -IdU' 



[Note:If d = (d\, d 2 , . . . d n ) then |d|oo = max/{d 7 }.] 

12. Let the direction (d x , d y , d s ) be generated by system (5.13) with y = n/(n + p) 
and p = x T s/n, and let the step size be 



a = 



6 Vmin(Xs) 



KXS)- 1 / 2 (^l-Xs)| 



where 6 is a positive constant less than 1. Let 



(5.20) 



x + = x + ad x , y + = y + crd y , and s + = s + ad s . 



Then, using Exercise 11 and the concavity of the logarithmic function show 
(x + , y + , s + ) e T and 



f„+p(x + , s + ) - <A„+p(x, s) 

< -0 VW(X^) |(Xs)- 1/2 (l - ^^Xs)| + -f— 

x Ts 2(1 - 6) 

13. Let v = Xs in Exercise 12. Prove 

Vmin(v)|V _1/2 (l - ( ” 7 + ^ v)l > V3/4, 
where V is the diagonal matrix of v. Thus, the two exercises imply 

q2 

^„ + p(x + , s + ) - <A„+p(x, s) < -0 V3/4 + — — — = -5 

A i ~ V) 

for a constant S. One can verify that S > 0.2 when 6 = 0.4. 

14. Prove property (5.18) for (HDSP). 

15. Prove Theorem 1 
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Chapter 6 

Conic Linear Programming 



6.1 Convex Cones 

Conic Linear Programming, hereafter CLP, is a natural extension of Linear 
programming (LP). In LP, the variables form a vector which is required to be com- 
ponentwise nonnegative, while in CLP they are points in a pointed convex cone (see 
Appendix B.l) of an Euclidean space, such as vectors as well as matrices of finite 
dimensions. For example, Semidefinite programming (SDP) is a kind of CLP, where 
the variable points are symmetric matrices constrained to be positive semidefinite. 
Both types of problems may have linear equality constraints as well. Although CLPs 
have long been known to be convex optimization problems, no efficient solution 
algorithm was known until about two decades ago, when it was discovered that 
interior-point algorithms for LP discussed in Chap. 5, can be adapted to solve cer- 
tain CLPs with both theoretical and practical efficiency. During the same period, it 
was discovered that CLP, especially SDP, is representative of a wide assortment of 
applications, including combinatorial optimization, statistical computation, robust 
optimization, Euclidean distance geometry, quantum computing, optimal control, 
etc. CLP is now widely recognized as a powerful mathematical computation model 
of general importance. 

First, we illustrate several convex cones popularly used in conic linear optimiza- 
tion. 

Example 1. The followings are all (closed) convex cones. 

• The ^-dimensional non-negative orthant, E" = {x € E n : x > 0 }, is a convex 
cone. 

• The set of all ^-dimensional symmetric positive semidefinite matrices, denoted 
by <S+, is a convex cone, called the positive semidefinite matrix cone. When X is 
positive semidefinite (positive definite), we often write the property as X > (>) 0 . 
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• The set {(w;x) e E n+l : u > |x| p } is a convex cone in E n+l , called the p-order 
cone where 1 < p < oo. When p = 2, the cone is called second-order cone or 
“Ice-cream” cone. 

Sometimes, we use the notion of conic inequalities P >k Q or Q < K P, in which 
cases we simply mean P - Q e K. 

Suppose A and B are k x n matrices. We define the inner product 

A • B = trace(A r B) = ^ aijbij- 

U 

When k = 1, they become ^-dimensional vectors and the inner product is the stan- 
dard dot product of two vectors. In SDP, this definition is almost always used for the 
case where the matrices are both square and symmetric. The matrix norm associated 
with the inner product is called Frobenius norm : 

|X|/ = Vx.x . 

For a cone K , the dual of K is the cone 

K* := {Y : X • Y > 0 for all XeK}. 

It is not difficult to see that the dual cones of the first two cones in Example 1 are all 
them self, respectively; while the dual cone of the /7-order cone is the g-order cone 
where 




One can see that when p = 2, q = 2 as well; that is, they are both 2-order cones. For 
a closed convex cone K , the dual of the dual cone is itself. 



6.2 Conic Linear Programming Problem 

Now let C and A/, i = 1,2, . . . , m, be given matrices of E kxn , b e E m , and 7C be 
a closed convex cone in E kxn . And let X be an unknown matrix of E kxn . Then, the 
standard form (primal) conic linear programming problem is 

(CLP) minimize C • X 

subject to A ; • X = bi , i = 1,2, . . . , m, X e K. (6.1) 

Note that in CLP we minimize a linear function of the decision matrix constrained 
in cone K and subject to linear equality constraints. 
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For convenience, we define an operator from a symmetric matrix to a vector: 



JIX = 



f A l *X\ 

a 2 *x 



< A w • X / 



( 6 . 2 ) 



Then, CLP can be written in a compact form: 

(CLP) minimize C • X 

subject to JiX = b, X e K. 

When cone K is the non-negative orthant E”, CLP reduces to linear programming 
(LP) in the standard form, where 2R becomes the constraint matrix A. When K is the 
positive semidefinite cone <S+, CLP is called semidefinite programming (SDP); and 
when K is the / 7 -order cone, it is called / 7 -order cone programming. In particular, 
when p - 2, the model is called second-order cone programming (SOCP). Fre- 
quently, we write variable X in (CLP) as x if it is indeed a vector, such as when K is 
the nonnegative orthant or /7-order cone. 

One can see that the problem (S DP) (that is, (6.1) with the semidefinite cone) 
generalizes classical linear programming in standard form: 

minimize c r x, 
subject to Ax = b, x > 0. 

Define C = Diag[ci, c 2 , c n ], and let A/ = Diagh^i, a% 9 ..., a in \ for i = 
1,2, . . . m. The unknown is the nxn symmetric matrix X which is constrained by 
X >: 0. Since the trace of C • X and A* • X depend only on the diagonal elements 
of X, we may restrict the solutions X to diagonal matrices. It follows that in this 
case the SDP problem is equivalent to a linear program, since a diagonal matrix 
is positive semidefinite is and only if its all diagonal elements are nonnegative. 

One can further see the role of cones in the following examples. 

Example 1. Consider the following optimization problems with three variables. 

• This is a linear programming problem in standard form: 

minimize 2x\ + x 2 + V 3 
subject to x\ + X2 + X 3 = 1 , 

(*i;* 2 ;* 3 ) > 0. 

• This is a semidefinite programming problem where the dimension of the matrix 
is two: 

minimize 2x\ + v 2 + V 3 
subject to x\ + X2 + X3 = 1 , 

Xj X2 

x 2 x 3 



> 0 , 
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Let 

C = ^ and Ai = 

Then, the problem can be written in a standard SDP form 

minimize C • X 

subject to Ai • X = 1, X e <S+. 

• This is a second-order cone programming problem: 

minimize 2x\ + X2 + *3 
subject to x\ + X2 + X3 = 1, 




We present several application examples to illustrate the flexibility of this formu- 
lation. 

Example 2 (Binary Quadratic Optimization). Consider a binary quadratic maxi- 
mization problem 

maximize x r Qx + 2c r x 

subject to Xj = {1, -1}, for all 7=1, . . . , n , 

which is a difficult nonconvex optimization problem. The problem can be rewrit- 
ten as 

r it r ~ ir 
X (1 c X 

z* = maximize , r ^ , 

1 J [c OJ [ 1 

subject to ( Xj ) 2 = 1, for all j = 1, . . . , n, 
which can be also written as a homogeneous quadratic binary problem 

* r o ci r x 1 r x 

z = maximize T ~ • 

& OJ [x n+ i \[x n+ i 

r ir I? 

X X 

subject to lj • = 1, for all j = 1, . . . , n + 1, 

%n + 1 %n + 1 

where 1 7 is the (n + 1) x (n + 1) matrix whose components are all zero except at the 
7th position on the main diagonal where it is 1. Let (x*; jc* + 1 ) be an optimal solution 
for the homogeneous problem. Then, one can see that x*/x* +1 would be an optimal 

solution to the original problem. 

r ir it 
X X 

Since forms a positive- semidefinite matrix (with rank equal to 1 ), 

%n + 1 %n + 1 

a semidefinite relaxation of the problem is defined as 



1 .5 
.5 1 • 
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z SDP = 



maximize 



Q c 

c T 0 



• Y 



subject to lj • Y = 1, for all j = 1, 
Y e S n + + \ 



. n + 1, 



(6.3) 



where the symmetric matrix Y has dimension n + 1. Obviously, is a upper 
bound of z* 9 since the rank-1 requirement is not enforced in the relaxation. 

Let’s see how to use the relaxation. For simplicity, assuming z SDP > 0, it has been 
shown that in many cases of this problem an optimal SDP solution either constitutes 
an exact solution or can be rounded to a good approximate solution of the original 
problem. In the former case, one can show that a rank-1 optimal solution matrix Y 
exists for the semidefinite relaxation and it can be found by using a rank-reduction 
procedure. For the latter case, one can, using a randomized rank-reduction procedure 
or the principle components of Y, find a rank-1 feasible solution matrix Y such that 



Qc 
c T 0 



• Y > a ■ Z SDP > a ■ Z* 



for a provable factor 0 < a < 1. Thus, one can find a feasible solution to the 
original problem whose objective value is no less than a factor a of the true maximal 
objective cost. 



Example 3 ( Sensor Localization). This problem is that of determining the location 
of sensors (for example, several cell phones scattered in a building) when measure- 
ments of some of their separation Euclidean distances can be determined, but their 
specific locations are not known. In general, suppose there are n unknown points 
Xj e E d , j = 1, . . . , n. We consider an edge to be a path between two points, 
say, i and j. There is a known subset N e of pairs (edges) i j for which the separation 
distance d^ is known. For example, this distance might be determined by the signal 
strength or delay time between the points. Typically, in the cell phone example, N e 
contains those edges whose lengths are small so that there is a strong radio signal. 
Then, the localization problem is to find locations Xj, j = 1, . . . , n, such that 

|x ; - Xj \ 2 = (dij) 2 , for all (7, j ) e N e , 



subject to possible rotation and translation. (If the locations of some of the sensors 
are known, these may be sufficient to determine the rotation and translation as well.) 
Let X = [xi X 2 ... x n ] be the dxn matrix to be determined. Then 

|x, - x f = (e, - e ; ) 7 X 7 X(e, - e ; ), 



where e* e E n is the vector with 1 at the ith position and zero everywhere else. Let 
Y = X r X. Then the semidefinite relaxation of the localization problem is to find Y 
such that 
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(e,- - e,)(e, - e,) 7 • Y = b/,,) 2 , for all (i, j) e N e , 

Y > 0. 

This problem is one of finding a feasible solution; the objective function is null. But 
if the distance measurements have noise, one can add additional variables and an 
error objective to minimize. For example, 

minimize Z ( ,j )£ ^ \zij\ 

subject to (e, - e ; )(e, - e ; ) 7 • Y + z, ; = (J ;; ) 2 , for all (i, j ) e N e , 

Y >0. 

This problem can be converted into a conic linear program with mixed nonnegative 
orthant and semidefinite cones. 

Under certain graph structure, an optimal SDP solution Y of the formulation would 
be guaranteed rank- J so that it constitutes an exact solution of the original problem. 
Also, in general Y can be rounded to a good approximate solution of the original 
problem. For example, one can, using a randomized rank-reduction procedure or the 
d principle components of Y, find a rank- J solution matrix Y. 
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We first introduce the notion of “interior” of cones. 

Definition 1. We call X an interior point of cone K if and only if, for any point 
Y € K\ Y • X = 0 implies Y = 0. 

o 

The set of interior points of K is denoted by K . 

Theorem 1. The interior of the followings convex cones are given as: 

• The interior of the non-negative orthant cone is the set of all vectors where every entry 
is positive. 

• The interior of the positive semidefinite cone is the set of all positive definite matrices. 

• The interior of p- order cone is the set of{(u; x) e E n+l : u > |x| p }. 

We give a sketch of the proof for the second order cone, i.e., p - 2. Let (fi; x) ^ 0 
be any second-order cone point but u = |x|. Then, we can choose a dual cone (also 
the second-order cone) point (v; y) such that 

v = au, y = -ax, 



for a positive a. Note that 



(u; x ) • (v; y) = av 2 - a\x | 2 = 0. 




6.3 Farkas’ Lemma for Conic Linear Programming 



155 



Then, one can let a — > oo so that (v; y) cannot be bounded. 

Now let (w; x) be any given second-order cone point with u > |x|. We like to prove 
that, for any dual cone (also the second-order cone) point (v; y), 

(m; x) • (v; y) = 0 

implies that (v; y) is bounded. Note that 

0 = ( u ; x) • (v; y) = uv + x • y 



or 



uv < -x*y < |x||y|. 



If v = 0, we must have y = 0; otherwise, 



u < |x||y|/v < |x|, 



which contradicts u > |x|. 

We leave the proof of the following proposition as an exercise. 

o 

Proposition 1. Let X EK and Y e K*. Then For any nonnegative constant k, Y • X < k 
implies that Y is bounded. 

Let us now consider the feasible region of (CLP) (6.1): 

T:={X: m = b,Xe K}\ 
where the interior of the feasible region is 

T:= {X : &K = b, X ek}. 

If is empty with K - E”, from Farkas’ lemma for linear programming, a vector 
y e E m , with y T A < 0 and y r b > 0, always exists and is called an infeasibility 
certificate for the system {x : Ax = b, x > 0}. 

Does this alternative relations hold for K being a general closed convex one? Let 
us rigorousize the question. Let us define the reverse operator of (6.2) from a vector 
to a matrix: 

m 

y 7 ^ = L ( 6 - 4 ) 

i= 1 

Note that, by the definition, for any matrix X e E kxn 

y T J[.X = y T (JiX), 

that is, the association property holds. Also, (y T Ji) T = Jl T y, that is, the transpose 
operation applies here as well. 

Then, the question becomes: when T is empty, does there exist a vector y e 
E m such that -y T Ji € K* and y r b > 0? Similarly, one can ask: when set 
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{y : C r - y T 2R € K] is empty, does there exist a matrix X e K* such that JiX = 0 
and C • X < 0? Note that the answer to the second question is also “yes” when 
K = E n + . 

Example 1. The answer to either question is “not necessarily”; see example below. 

• For the first question, consider K = S\ and 



Ai = 



1 0 
00 



a 2 = 



0 1 
1 0 



and 




• For the second question, consider K = S 2 + and 



0 1 



1 0 



and 



Ai = 



1 0 
00 ' 



However, if the data set fR satisfies additional conditions, the answer would be 
“yes”; see theorem below. 

Theorem 2 (Farkas’ Lemma for CLP). We have 

• Consider set 

T p := {X : MX = b, X g K). 

o oT 

Suppose that there exists a vector y such that — y 2A eK*. Then, 

1. Set C := {RX G E m : X G K) is a closed convex set; 

2. Tp has a ( feasible ) solution if and only if set {y : -y 7 ^ G K*, y r b > 0} has no 
feasible solution. 

• Consider set 

T d := (y : C T - y T Jl e K). 

O o o 

Suppose that there exists a vector XzK* such that 2R X= 0. Then, 

1. Set C \= {S - y r ^l : S e K} is a closed convex set; 

2. Td has a ( feasible ) solution if and only if set {X : 3TX = 0, X G K*, C • X < 0} has 
no feasible solution. 



Proof. We prove the first statement of the theorem. We prove the first part. It is 
clear that C is a convex set. To prove that C is a closed set, we need to show that 
if y* := 2RX k € E m for X k e K, k = 1, . . ., converges to a vector y, then y e Cor 
there is X e K such that y := JIX. Without loss of generality, we assume that y k is 
a bounded sequence. Then, we have, for a positive constant c, 
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Since ~(J) T Ji eK*, by definition, the sequence of X k is also bounded. Then there is 
at least an accumulate point X e K because K is a closed cone. Thus, we must have 
y := tfX. 

We now prove the second part. If T p has a feasible solution X. Then, let y make 
-y T J{ e K * 

-y r b = -y r (J?lX) = • X > 0. 

Thus, it must be true y r b < 0, that is, {y : -y T e K*, y r b > 0} must be empty. 

On the other hand, let T p has no feasible solution, or equivalently, b t C. We 
now show that {y : -y T e K *, y r b > 0} must be nonempty. 

Since C is a closed convex set, from the separating hyperplane theorem, there 
must exist aye E m such that 



y r b > y T y, Vy e C, 



or, from y = JiX, X e K, we have 

y r b > y T (J{X) = y r ^l • X, VX e K. 

That is, y T Jl • X is bounded above for all X e K. 

Immediately, we see y r b > 0 since 0 e K. Next, it must be true -y T Ji € K*. 
Otherwise, we must be able to find an X e K such that -y T Jl • X < 0 by the 
definition of K and its dual K* . For any positive constant a we maintain aX e K 
and let a go to oo. Then, y T Ji • (orX) goes to oo, contradicting the fact that y T Jl • X 
is bounded above for all X e K. Thus, y is a feasible solution in {y : -y T Jl e 
K*, y r b > 0}. I 

Note that C may not be a closed set if the interior condition of Theorem 2 is not 
met. Consider Ai, A 2 and b in Example 1, and we have 



C = 




Ai • X 

a 2 .x 



Xe.S 2 + j. 



Let 



i 1 

1 k 



e SL Vk = 1 , 



Then we see 



/ = tfX. k = 



k 

2 ' 



As k — > 00 we see y k converges b, but b is not in C. 
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6.4 Conic Linear Programming Duality 



Because conic linear programming is an extension of classical linear programming, 
it would seem that there is a natural dual to the primal problem, and that this dual 
is itself a conic linear program. This is indeed the case, and it is related to the primal 
in much the same way as primal and dual linear programs are related. Furthermore, 
the primal and dual together lead to the formation a primal-dual solution method, 
which is discussed later in this chapter. 

The dual of the (primal) CLP (6.1) is 

(CLD) maximize y r b 

subject to Yj™yt A; + S = C r ,Ser. (6.5) 

On written in a compact form: 

(CLD) maximize y r b 

subject to y T Ji + S = C r , S e K*. 

Notice that S represents a slack matrix, and hence the problem can alternatively be 
expressed as 



maximize y r b 

Z m T 

ytAi < K * C 1 . (6.6) 

Recall that conic inequality Q <k P means P - Q e K. 

Again, just like linear programming, the dual of (CLD) will be (CLP), and they 
form a primal and dual pair. Whichever is the primal, then the other will be the dual. 
We would see more primal and dual relations later. 

Example 1. Here are dual problems to the three instances in Example 1 where y is 
just a scalar. 

• The dual to the linear programming instance: 
maximize y 

subjectto y(l, 1, l) + (si, S 2 , S 3 ) = (2, 1, 1), 
s = (Ji, s 2 , S 3 ) G K* = E\. 

• The dual to semidefinite programming instance: 

maximize y 

subject to yAi + S = C, 

Ser= Si, 
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where recall 



2 .. 
5 : 



and 



At = 



1 .5 
.5 1 ' 



• The dual to the second-order cone instance: 



maximize y 

subjectto y( 1 , 1 , l) + Oi, S 2 , S 3 ) = ( 2 , 1 , 1 ), 

^2 + ^3 - or s - S 2 , S 3 ) in second-order cone. 

Let us consider a couple of more dual examples of the problems we posted earlier. 



Example 2 ( The Dual of Binary Quadratic Maximization). Consider the semidefi- 
nite relaxation (6.3) for the binary quadratic maximization problem. It’s dual is 



minimize 
subject to 




S>0. 



Note that 

Q c 

c T 0 

is exactly the Hessian matrix of the Lagrange function of the quadratic maximization 
problem; see Chap. 11. Therefore, there is a close connection between the Lagrange 
and conic dualities. The problems is to find a diagonal matrix Diag[(yi; . . .\y n + 1 )] 
such that the Lagrange Hessian is positive semidefinite and its sum of diagonal 
elements is minimized. 




Example 3 ( The Dual of Sensor Localization). Consider the semidefinite program- 
ming relaxation for the sensor localization problem (with no noises). It’s dual is 

maximize > y, / 

2Ll(i,j)zN e JlJ 

subject to y '4 ei “ ®/X e « - e,) r + S = 0, S > 0. 

Here, yij represents an internal force or tension on edge (/, j). Obviously, yij = 0 
for all (/, j) e N e is a feasible solution for the dual. However, finding non-trivial 
internal forces is a fundamental problem in network and structure design, and the 
maximization of the dual would help to achieve the goal. 

Many optimization problems can be directly cast in the CLD form. 

Example 4 ( Euclidean Facility Location). This problem is to determine the location 
of a facility serving n clients placed in a Euclidean space, whose known locations 
are denoted by a 7 e E d , j = 1, . . . , n. The location of the facility would minimize 
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the sum of the Euclidean distances from the facility to each of the clients. Let the 
location decision be vector f € E d . Then the problem is 

minimize Y!]=\ |f — a y -| . 

The problem can be reformulated as 

minimize Y!}=i 

subject to Sj + f = 2Lj, V/ = 1, . . . , n, 

|sy| <6j, Vj 

This is a conic formulation in the (CLD) form. To see it clearly, let d = 2 and n- 3 
in the example, and let 

1 00 0 0 
0 00 -1 0 
0 00 0 -1 
0 10 0 0 

A r = 000 -1 0 e E 9x \ b 
0 00 0 -1 
00100 
0 00 -1 0 
.0 00 0 -1. 

and variable vector 

y = [tfi; <s 2 ; <5 3 ; f]e£ 5 . 

Then, the facility location problem becomes 

minimize y r b 

subject to y r A + s T = c r , s £ K\ 

where K is the product of three second-order cones each of which has dimension 3 . 
More precisely, the first three elements of s e E 9 are in the 3 -dimensional second- 
order cone; and so are the second three elements and the third three elements of s. 
In general, the product of (possibly mixed) cones, say K\ , K 2 and K 3 , is denoted by 
K\ © K2 © £3, and X e K\ © K2 © K3 means that X is divided into three components 
such that 

X — (Xi; X2; X3), where X^ £ K \ , X2 £ K2, andX3 £ 

The dual of the facility location problem would be in the (CLP) form: 

minimize c r x 

subject to Ax = b, x £ K* \ 
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where 

K* = (Ki @K 2 ® K 3 y = K* © © k;. 

That is, in this particular problem, the first three elements of x e E 9 are in the 
3 -dimensional second- order cone; and so are the second three elements and the third 
three elements of x. 

Consider further the equality constraints, the dual can be simplified as 
maximize Y 3 -, 

J 1 J J 

subject to x j — 0 c E 2 , 

\xj\ <1, Vj = 1,2,3. 

Example 5 ( Quadratic Constraints). Quadratic constraints can be transformed to 
linear semidefinite form by using the concept of Schur complements. Let A be a 
(symmetric) m - dimension positive definite matrix, C be a symmetric ^-dimension 
matrix, and B be an m x n matrix. Then, matrix 

S = C-B^A^B 



is called the Schur complement of A in the matrix 



Z = 



A B 
B r C 



Moreover, Z is positive semidefinite if and only if S is positive semidefinite. 

Now consider a general quadratic constraint of the form 

y r B r By - c r y - d < 0. (6.7) 



This is equivalent to 



I By 

y r B r c r y + d 



>0 



( 6 . 8 ) 



because the Schur complement of this matrix with respect to I is the negative of the 
left side of the original constraint (6.7). Note that in this larger matrix, the variable 
y appears only affinely, not quadratic ally. 

Indeed, (6.8) can be written as 



P(y) = P 0 + y 1 P 1 + y 2 P 2 + • • • yrFn > 0, (6.9) 



where 



Po 



I 0 
0 d 




for i = 1,2, . . . n 



with b, being the ith column of B and c { being the ith component of c. The constraint 
(6.9) is of the form that appears in the dual form of a semidefinite program. 
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There is a more efficient mixed semidefinite and second-order cone formulation 
of the inequality (6.7) to reduce the dimension of semidefinite cone. We first intro- 
duce slack variable s and so by linear constraints: 



By - s = 0 



Then, we let |s| < so (or (so; s) in the second-order cone) and 



1 s 0 
So C r y + d 



>0. 



Again, the matrix constraint is of the dual form of a semidefinite cone, but its 
dimension is fixed at 2. 

Suppose the original optimization problem has a quadratic objective: mini- 
mize q(x). The objective can be written instead as: minimize t subject to q(x) < t, 
and then this constraint as well as any number of other quadratic constraints can 
be transformed to semidefinite constraints, and hence the entire problem converted 
to a mixed second-order cone and semidefinite program. This approach is useful 
in many applications, especially in various problems of financial engineering and 
control theory. 



The duality is manifested by the relation between the optimal values of the primal 
and dual programs. The weak form of this relation is spelled out in the following 
lemma, the proof of which, like the weak form of other duality relations we have 
studied, is essentially an accounting issue. 

Weak Duality in CLP. Let X be feasible for ( CLP) and (y, S) feasible for ( CLD). Then, 

C • X > y r b. 



Proof. By direct calculation 

m 

C • X - y r b = (^y,A, + S) • X - y r b 

i= 1 
m 

= J]y,(A«.X) + S.X-y r b 

i= 1 
m 

= Y j y l b, + S*X-y T b 

i= 1 

= S • X > 0, 

where the last inequality comes from X e K and S e K * . I 

As in other instances of duality, the strong duality of conic linear programming 
is weak unless other conditions hold. For example, the duality gap may not be zero 
at optimality in the following SDP instance. 



6.4 Conic Linear Programming Duality 



163 



Example 6. The following semidefinite program has a duality gap: 





0 


1 


o' 




0 


0 


o' 




0 


-1 


0 


c = 


1 


0 


0 


, A! = 


0 


1 


0 


, a 2 = 


-1 


0 


0 




0 


0 


0 




0 


0 


0 




0 


0 


2 




The primal minimal objective value is 0 achieved by 



X = 



0 0 
0 0 
0 0 



0 

0 

1 



and the dual maximal objective value is -2 achieved by y = [0, -1]; so the duality 
gap is 2. 

However, under certain technical conditions, there would be no duality gap. One 
condition is related to weather or not the primal feasible region T p or dual feasible 
region has an interior feasible solution. We say T p has an interior (feasible solution) 
if and only if 

r p := {X : JIX = b, X ek] 

is non-empty, and Td has an interior feasible solution if and only if 
f d :={( y,S): y r J?l + S = C, S eK } 



is non-empty. We state here a version of the strong duality theorem. 

Strong Duality in ( CLP). 

i) Let (CLP) or (CLD) be infeasible, and furthermore the other be feasible and has an 
interior. Then the other is unbounded. 

ii) Let (CLP) and (CLD) be both feasible, and furthermore one of them has an interior. 
Then there is no duality gap between ( CLP) and ( CLD). 

iii) Let (CLP) and (CLD) be both feasible and have interior. Then, both have optimal 
solutions with no duality gap. 

Proof. We let cone H = K 0 E l + in the following proof, 
i) Suppose Td is empty and T p is feasible and has an interior feasible solution. 

_ o 

Then, we have an X eK and f = 1 that is an interior feasible solution to (homo- 
geneous) conic system: 



o 



J?(X - bf = 0, (X, f) eH . 
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Now, for any z*, we form an alternative system pair based on Farkas’ Lemma 
(Theorem 2): 

{(X, r) : JHX - br = 0 , C • X - z*r < 0, (X,r) e H], 



and 

{(y; s, K) : + S = C, -b r y + * = -z*, (S, k) e //*}• 

But the latter is infeasible, so that the former has a feasible solution (X, r). 
At such a feasible solution, if r > 0, we have C • (X/r) < z* for any z*. 
Otherwise, r = 0 implies that a new solution X + aX is feasible for (CLP) 
for any positive a\ and, as a — > oo, the objective value of the new solution goes 
to -oo. Hence, either way we have a feasible solution for (CLP) whose objective 
value is unbounded from below. 

ii) Let T p be feasible and have an interior feasible solution, and let z* be its objec- 
tive infimum. Again, we have an alternative system pair as listed in the proof 
of i). But now the former is infeasible, so that we have a solution for the latter. 
From the Weak Duality theorem b r y < z\ thus we must have k- 0, that is, we 
have a solution (y, S) such that 

W T y + S = C, b r y = z*, S e K*. 



iii) We only need to prove that there exist a solution Xef^ such that C • X = z*, 
that is, the infimum of (CLP) is attainable. But this is just the other side of the 
proof given that Td is feasible and has an interior feasible solution, and z* is 
also the supremum of (CLD). 1 



Again, if one of (CLP) and (CLD) has no interior feasible solution, the common 
objective value may not be attainable. For example, 



1 



00 



Ax 



0 1 
1 0 



and b\ - 2. 



The dual is feasible but has no interior, while the primal has an interior. The common 
objective value equals 0, but no primal solution attaining the infimum value. 

Most of these examples that make the strong duality failed are superficial, and 
a small perturbation would overcome the failure. Thus, in real applications and in 
the rest of the chapter, we may assume that both (CLP) and (CLD) have interior 
when they are feasible. Consequently, any primal and dual optimal solution pair 
must satisfy the optimality conditions: 

C • X - y r b = 0 
vTlX = b 
y T W + S = C T 
XeK, S eK* 



( 6 . 10 ) 
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or 



X*S = 0 
JIX = b 
y T Jl + S = C T 
XtK, S €K* 



( 6 . 11 ) 



We now present an application of the strong duality theorem. 

Example 7 ( Robust Portfolio Design). The Markowitz portfolio design model (also 
see 5) is 



minimize x r 27x 

subject to l r x = 1, n T x > n, 

where 27 is the covariance matrix and n is the expect return rate vector of a set of 
stocks, and n is the desired return rate of the portfolio. The problem can be equiva- 
lently written as a mixed conic problem 

minimize 27 • X 

subject to l T x = 1, n T x > tt, 

X - xx r > 0. 



Now suppose 27 is incomplete and/or uncertain, and it is expressed by 

m 

Zo + ^y*27;(> 0), 



i=i 



minimize 



for some variables yf s. Then, we like to solve a robust model 

maXy (r 0 + Z'"i >’*) * X 
s.t. z 0 + 2” i y£i > o 
subject to l T x = 1, n T x > n, 

X-xx T > 0. 



The inner problem is an SDP problem. Assuming strong duality holds, we replace 
it by its dual, and have 

( miny 27 0 • (Y + X) \ 

minimize < s.t. 27/ • (Y + X) = 0, V/ = 1, . . . , m, > 

l Y>0 J 

subject to l r x = 1, 7t t x > n, 

X-xx T > 0. 
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Then, we can integrate the two minimization problems together and form 

minimize 27 0 • (Y + X) 
subject to l T x = 1 , 7 t t x > n, 

• (Y + X) = 0, Vi = 1, . . . , m, 

Y > 0, X - xx r > 0. 



6.5 Complementarity and Solution Rank of SDP 

In linear programming, since x > 0 and s > 0, 

n 

0 = X • S = X 7 S = ^ XjSj 
7=1 

implies that XjSj = 0 for all j = 1, . . . , n. This property is often called complemen- 
tarity. Thus, besides feasibility, and optimal linear programming solution pair must 
satisfy complementarity. 

Now consider semidefinite cone . Since X > 0 and S > 0, 0 = X • S implies 
XS = 0, that is, the regular matrix product of the two is a zero matrix. In other 
words, every column (or row) of X is orthogonal to every column (or row) of X. 
We also call such property complementarity. Thus, besides feasibility, an optimal 
semidefinite programming solution pair must satisfy complementarity. 

Proposition 1. Let X* and (y*, S*) be any optimal SDP solution pair with zero duality gap. 
Then complementarity ofX* and S* implies 

rank(X*) + rank(S*) < n. 

Furthermore, is there an optimal (dual) S* such that rankS* > d, then the rank of any 
optimal ( primal ) X* is bounded above by n - d, where integer 0 < d < n; and the converse 
is also true. 

In certain SDP problems, one may be interested in finding an optimal solution 
whose rank is minimal, while the interior-point algorithm for SDP (developed later) 
typically generates solution whose rank is maximal for primal and dual, respec- 
tively. Thus, a rank reduction method sometimes is necessary to achieve this goal. 
For linear programming in the standard form, it is known that if there is an optimal 
solution, then there is an optimal basic solution x* whose positive entries have at 
most m many. Is there a similar structural fact for semidefinite programming? In 
deed, we have 

Proposition 2. If there is an optimal solution for SDP, then there is an optimal solution of 
SDP whose rank r satisfies — ^ < m. 

The proposition resembles the linear programming fundamental theorem of 
Caratheodory in Sect. 2.4. We now give a sketch of similar constructive proof, as 
well as several other rank-reduction methods. 



6.5 Complementarity and Solution Rank of SDP 



167 



Null-Space Rank Reduction 



Let X* be an optimal solution of SDP with rank r. If r(r + l)/2 > m, we orthonor- 
mally factorize X* 

X* = (V*) r V\ V* g E rxn . 

Then we consider a related SDP problem 

minimize V*C(V*) r • U 

subject to V*A;(V*) r • U = b t , i = 1, . . . , m (6.12) 

UeS r + . 

Note that, for any feasible solution of (6.12) one can construct a feasible solution 
for original SDP using 

X(U) = (V*) r UV* and C • X(U) = V*C(V*) r • U. 

Thus, the minimal value of (6.12) is also z*, and in particular U = I (the identity 
matrix) is an minimizer of (6.12), since 

V*C(V*) r • I = C • (V*) r v* = C • X* = z*. 

Also, one can show that any feasible solution U of (6.12) is its minimizer, so that 
X(U) is a minimizer of original SDP. 

Consider the system of homogeneous linear equations: 

V*A,-(V*) r • W = 0, 

where W e S r (i.e., a r x r symmetric matrices that does not need to be semidef- 
inite). This system has r(r + l)/2 real variables and m equations. Thus, as long as 
r(r + l)/2 > m, we must be able to find a symmetric matrix W ^ 0 to satisfy all 
the m equations. Without loss of generality, let W be either indefinite or negative 
semidefinite (if it is positive semidefinite, we take -W as W), that is, W have at 
least one negative eigenvalue. Then we consider 

U (a) = I + orW. 

Choosing a a* sufficiently large such that U(cr*) > 0 and it has at least one 0 eigen- 
value (or rankU(cC) < r). Note that 

V*A/(V*) r • U(a*) = V*A,(V*) r • (I + <**W) = V*A,-(V*) 7 ' • I = b t , i = . ,,m. 

That is, U(or*) is feasible and also optimal for (6.12). Thus, X(U(or*)) is a new min- 
imizer for the original SDP, and its rank is strictly less than r. This process can be 
repeated till the system of homogeneous linear equations has only all-zero solution, 
which is necessary when r(r + l)/2 < m. Such a solution rank reduction procedure 
is called the Null-space reduction, which is deterministic. 
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To see an application of Proposition 2, consider a general quadratic minimization 
with sphere constraint 



z* = minimize x r Qx + 2c r x 
subject to |x| 2 = 1, x e E n , 



where Q is general. The problem has an SDP relaxation: 



z SDP = maximize 
subject to 



Q c 

c T 0 

I 0 

0 r 0 

0 0 

0 r 1 

Y 



• Y 



• Y = 1, 

• Y = 1, 

eS n + +l . 



Note that the relaxation and its dual both have interior so that the strong duality 
theorem holds, and it must have a rank-1 optimal SDP solution because m = 2. But 
a rank- 1 optimal SDP solution would be optimal to the original quadratic minimiza- 
tion with sphere constraint. Thus, we must have z* = z SDP . 



Gaussian Projection Reduction 

There is also a randomized procedure to produce an approximate SDP solution with 
a desired low rank d. Again, let X* be an optimal solution of SDP with rank r > d 
and we factorize X* as 



x* = (V*) r V*, V* e E rxn . 



We then generate i.i.d. Gaussian random variables with mean 0 and variance 1 / d, 
i = 1, . . . , r; j = 1, . . . , d, and form random vectors ^ £ J r ), j - 1, . . . , d. 

Finally, we let 




Note that the rank of X is d and 



d 



E(X) = (V*) r E 






7=1 



y* = (V*) r IV* 



= x*. 



One can further show that X would be a good rank -d approximate SDP solution in 
many cases. 
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Randomized Binary Reduction 



As discussed in the binary QP optimization, we like to produce a vector x where 
each entry is either 1 or -1. A procedure to achieve this is as follows. Let X* be any 
optimal solution of SDP and we factorize X* as 

X* = (V*) r V*, V* e E nxn . 

Then, we generate a random ^-dimensional vector £ where each entry is a i.i.d. Gaus- 
sian random variable with mean 0 and variance 1 . Then we let 



x = sign((V*) r £) 



where 

, f 1 if x > 0 
sign(x) = j_i oth e r wise. 

It was proved by Sheppard [228]: 

2 

E[ XiXj] = - arcsin(X*), i, j = 1, 2, . . . , n. 

71 J 

Obviously, each entry of x is either 1 or - 1 . 

One can further show x would be a good approximate solution to the origi- 
nal binary QP. Let us consider the (homogeneous) binary quadratic maximization 
problem 



z* := maximize x r Qx 

subject to Xj = {1, -1}, for all 7 = 1, . . . , n , 

where we assume Q is positive semidefinite. Then, the SDP relaxation would be 

z SDP := maximize Q • X 

subject to lj • X = 1 , for all j - 1, . . . , n , 

X e <S+; 

and let X* be any optimal solution, from which we produced a random binary vector 
x. Let us evaluate the expected objective value 



2 2 

E(x r Qx) = E(Q • xx T ) = Q • E(xx r ) = Q • - arcsin[X*] = -( Q • arcsin[X*]), 

71 71 

where arcsin[X*] e S n whose (/, j) the entry equals arcsin(X*p. One can further 
show 



arcsin[X*] - X* > 0 
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so that (from Q > 0) 



Q • arcsin[X*] >Q»X*= z SDP > z*, 

that is, the expected objective value of x is no less than factor | of the maximal 
value of the binary QP. 

The randomized binary reduction can be extended to quadratic optimization with 
simple bound constraints such as xj < 1 . 



6.6 Interior-Point Algorithms for Conic Linear Programming 

Since (CLP) is a convex minimization problem, many optimization algorithms are 
applicable for solving it. However, the most natural conic linear programming algo- 
rithm seems to be an extension of the interior-point linear programming algorithm 
described in Chap. 5. We describe what it is now. 

To develop efficient interior-point algorithms, the key is to find a suitable barrier 
or potential function. There is a general theory on selection of barrier functions for 
(CLP), depending on the convex cone involved. We present few for the convex cones 
listed in Example 1 . 

Example 1. The following are barrier function for each of the convex cones. 

• The ^-dimensional non-negative orthant E + : 

n 

B(x) = -L lo g(*/)- 

j= i 

• The ^-dimensional semidefinite cone S n + : 

B(X) = - log(detX). 

• The ( n + l)-dimensional second-order cone {{u\ x) : u > |x|}: 

B{u\ x) = - 1 og(u 2 - |x| 2 ). 

In the rest of the section, we devote our discussion on solving (SDP). Similar to 
LP, we consider (SDP) with the barrier function added in the objective: 

0 S DPB) minimize C • X - p log det(X) 

o 

subject to X 

or (SDD) with the barrier function added in the objective: 

(S DDB) maximize y r b + p log det(S) 

o 

subject to (y, S) efd, 
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where again p > 0 is called the barrier weight parameter. For a given p, the mini- 
mizers of (SDPB) and (SDDB) satisfy conditions: 



XS=pI 
JiX = b 
W T y + S = C 
X > 0, S > 0 



(6.13) 



Since 



_ trace(XS) _ X»_S _ C • X - y r b 
^ n n n 

so that p equals the average of complementarity or duality gap. And, these minimiz- 
ers, denoted by ( X(p ), y (p), S(p)), form the central path of SDP for mu e (0, oo). It is 
known that when /i —> 0, (X(p), y (p), S(p)) tends to an optimal solution pair whose 
rank is maximal (Exercise 11). 

We can also extend the primal-dual potential function from LP to SDP as a 
descent merit function: 

i/s n+p (X, S ) = (#!+ p) log(X • S) - log(det(X) • det(S)) 

where p > 0. Note that if X and S are diagonal matrices, these definitions reduce to 
those for linear programming. 

Once we have an interior feasible point (X, y, S), we can generate a new iterate 
(X + , y + , S + ) by solving for (D x , d y , D s ) from the primal-dual system of linear 
equations 



D _1 D X D _1 + D s = /J.X- 1 - S, 

n + p 

A i • D x = 0, for all i, 
2'"(d v ),A, + D s = 0, 

where D is the (scaling) matrix 



(6.14) 



D = X^ (X2 SX^)-5X5 



and p = X • S/n. Then one assigns X + = X + crD x , y + = y + ad y , and S + = s + <3(D S 
for a step size a > 0. Furthermore, it can be shown that there exists a step size a = a 
such that 

^ + p(X + , S + ) - ifr n+p (X, S) < -6 

for a constant S > 0.2. 

We outline the algorithm here 



Step 1. Given (X°, y°, S°) Set p > yfn and k := 0. 

Step 2. Set (X, S) = (X k , S k ) and compute (D x , d y , D s ) from (6. 14). 
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Step 3. Let X^ +1 = + aD x , y^ +1 = y k + ad y , and S k+l = S k + oD s , where 

a = argmin if/ n+p (X k + oD x , S k + crD s ). 

(*>0 

Step 4. Let k := k + 1. If < C Stop. Otherwise return to Step 2. 

Theorem 3. Let if/ n+p (X°, S°) < plog(X° • S°) + nlogn. TTzew, r/ze algorithm terminates in 
at most 0(p \og(n /e) iterations. 



Initialization: The HSD Algorithm 

The linear programming Homogeneous Self-Dual Algorithm is also extendable to 
conic linear programming. Consider the minimization problem Homogeneous self- 
dual algorithm! for conic linear programming 

C HSDCLP ) min (n + 1)6 

s.t. JiX -br +b 6 

-W T y +Cr —CO 

b T y -C • X +z6 

-b T y +C • X -zr 

y free, X e K, r > 0, 0 free, 

where 

b = b - m°, C = C - S°, Z = c • X° + 1 

Here X° and S° are any pair of interior points in the interior of K and K* such 
that they form a central path point with p = 1. Note that X° and S° don’t need to 
satisfy other equality constraint, so that they can be easily identified. For examples, 
x° — y° — 1 for the nonnegative orthant cone; x° = y° = (1; 0) for the p-order cone; 
and X° = X° = I for the semidefinite cone. 

Let T be the set of all feasible points (y,Xe^,T>O,0,Ser,/c>O). Then T 

o o * 

is the set of interior feasible points (y, X eK, r > 0,0, S £K ,k>0). 

Theorem 4. Consider the conic optimization (HSDCLP). 

i) ( HSDCLP ) is self-dual, that is, its dual has an identical form of (HSDCLP). 

ii) (HSDCLP) has an optimal solution and its optimal solution set is bounded. 

iii) (HSDCLP) has an interior feasible point 

y = 0, X = X°, r = 1, 0=1, S = S°, k= 1. 

iv) For any feasible point (y, X, t, 6, S ,k) 

S° • X + X° • S + t + k - (n + 1)6 = (n + 1), 



= 0 , 

= SeK\ 

= k> 0 , 

= -(«+!), 



and 



X • S + tk = (n+ 1)6. 
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v) The optimal objective value of (HSDCLP) is zero, that is, any optimal solution of 
(HSDCLP) has 

X* • S* + t* k = (n + 1)6 * = 0. 

Now we are ready to apply the interior-point algorithm, starting from a available 
initial interior-point feasible solution, to solve (HSDCLP). The question is: how is 
an optimal solution of (HSDCLP) related to optimal solutions of original (CLP) and 
(CLD)? We present the next theorem, and leave this proof as an exercise. 

Theorem 5. Let (y*,X*,r*,0* = 0, S*,/c*) be a ( maximal rank) optimal solution of (HSD- 
CLP) (as it is typically computed by interior-point algorithms ). 

i) (CLP) and (CLD) have an optimal solution pair if and only if t* > 0. In this case, 
X*/t* is an optimal solution for (CLP) and (y*/r*, S*/r*) is an optimal solution for 
(CLD). 

ii) ( CLP) or ( CLD) has an infeasibility certificate if and only if k* > 0. In this case, X*/V* 
or S* / k* or both are certificates for proving infeasibility; see Parkas ’ lemma for CLP. 

iii ) For all other cases, r* = k* = 0. 



6.7 Summary 

A relatively new class of mathematical programming problems, Conic linear pro- 
gramming (hereafter CLP), is a natural extension of Linear programming that is a 
central decision model in Management Science and Operations Research. In CLP, 
the unknown is a vector or matrix in a closed convex cone while its entries are also 
restricted by some linear equalities and/or inequalities. 

One of cones is the semidefinite cone, that is, the set of all symmetric positive 
semidefinite matrices in a given dimension. There is a variety of interesting and 
important practical problems that can be naturally cast in this form. Because many 
problems which appear nonlinear (such as quadratic problems) become essentially 
linear in semidefinite form. We have described some of these applications and se- 
lected results in Combinatory Optimization, Robust Optimization, and Engineering 
Sensor Network. We have also illustrated some analyses to show why CLP is an 
effective model to tackle these difficult optimization problems. 

We present fundamental theorems underlying conic linear programming. These 
theorems include Farkas’ lemma, weak and strong dualities, and solution rank struc- 
ture. We show the common features and differences of these theorems between LP 
and CLP. 

The efficient interior-point algorithms for linear programming can be extended 
to solving these problems as well. We describe these extensions applied to gen- 
eral conic programming problems. These algorithms closely parallel those for linear 
programming. There is again a central path and potential functions, and Newton’s 
method is a good way to follow the path or reduce the potential function. The homo- 
geneous and self-dual algorithm, which is popularly used for linear programming, 
is also extended to CLP. 
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6.8 Exercises 

1. Prove that 

i) The dual cone of E+ is itself. 

ii) The dual cone of is itself. 

iii) The dual cone of p-order cone is the p-order cone where - + - = 1 and 

1 < p < oo. 

2. When both K\ and K 2 are closed convex cones. Show 

i) {K\Y = K x . 

ii) Ki c K 2 => K* c K*. 

iii) (Ki®K 2 T = K{®K*. 

iv) (K x + K 2 y = K{ H K*. 

v) (Ki n K 2 y = k\ + /q. 

Note: by definition S + T = {s + t : s e S, teT}. 

3. Prove the following: 

i) Theorem 1. 

ii) Proposition 1. 

iii) Let X ek and Y eK\ Then X • Y > 0. 

4. Guess an optimal solution and the optimal objective value of each instance of 
Example 1. 

5. Prove the second statement of Theorem 2. 

6. Verify the weak duality theorem of the three CLP instances in Example 1 in 
Sect. 6.2 and Example 1 in Sect. 6.4. 

7. Consider the SDP relaxation of the sensor network localization problem with 
four sensors: 

(e,- - e 7 )(e, - e i ) r • X = 1, Vi < j = 1, 2, 3, 4, 

X g S*, 

in which m = 6. Show that the SDP problem has the solution with rank 3, which 
reaches the bound of Proposition 2. 

8. Let A and B be two symmetric and positive semidefinite matrices. Prove that 
A • B > 0, and A • B = 0 implies AB = 0. 

9. Let X and S both be positive definite. Prove that 

n log(X • S) - log(det(X) • det(S)) > n log ft. 

10. Consider a SDP and the potential level set 



T® = {(X, y,S)ef: ^ B+P (X, S) < 8}. 
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Prove that 

T( < 5 1 ) c 'T(5 2 ) if <5* < 6 2 , 

and for every S , T(d) is bounded and its closure 'F(c)) has non-empty intersec- 
tion with the SDP solution set. 

1 1 . Let both (SDP) and (SDD) have interior feasible points. Then for any 0 < /i < oo, 
the central path point (X(ju), y(ju), S(/i)) exists and is unique. Moreover, 

i) the central path point (X(//), y (p), S (41)) is bounded where 0 < yu < yu° for 
any given 0 < jj° < 00 . 

ii) For 0 < /d' < jj, 

C • X(ji') < C • X(jj) and b r y (//) > b r y ip) 

ifXO u)±X(ji') and y(jj) ± y(p'). 

iii) (X(yu), y (p), S (41)) converges to an optimal solution pair for (SDP) and 
(SDD), and the rank of the limit of X(p) is maximal among all optimal 
solutions of (SDP) and the rank of the limit S ip) is maximal among all 
optimal solutions of (SDD). 

12. Prove the logarithmic approximation lemma for SDP. Let D e S n and |D|oo < 1. 
Then, 

|D| 2 

trace(D) > logdet(I + D) > trace(D) - _ . 

Z(1 |"|oo) 

o n 

13. Let V eS+ and p > yjn. Then, 

|y-l/2 _ «j£yl/2| 




14. Prove both Theorems 4 and 5. 
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The sensor localization problem described here is due to Biswas and Ye [B17]. 
Note that we can view the Sensor Network Localization problem as a Graph 
Realization or Embedding problem in Euclidean spaces, see So and Ye [231] 
and references therein; and it is related to the Euclidean Distance Matrix Com- 
pletion Problems, see Alfakih et al. [3] and Laurent [151]. 

6.3 Farkas’ lemma for conic linear constraints are closely linked to convex analysis 
(i.e, Rockeafellar [219]) and the CLP duality theorems commented next. 

6.4 The conic formulation of the Euclidean facility location problem was due to 
Xue and Ye [264]. For discussion of Schur complements see Boyd and Vander- 
berghe [B23]. Robust optimization models using SDP can be found in Ben-Tal 
and Nemirovski [26] and Goldfarb and Iyengar [112], and etc. The SDP duality 
theory was studied by Barvinok [16], Nesterov and Nemirovskii [N2], Ramana 
[214], Ramana e al. [215], etc. The SDP example with a duality gap was con- 
structed by R. Freund (private communication). 

6.5 Complementarity and rank. The exact rank theorem described here is due to 
Pataki [205], also see Barvinok [15]. A analysis of the Gaussian projection was 
presented by So et al. [232] which can be sees as a generalization of the John- 
son and Lindenstrauss theorem [137]. The expectation of the randomized binary 
reduction is due to Sheppard [228] in 1900, and it was extensively used in Goe- 
mans and Williamson [G8] and Nesterov [189], Ye [265], and Bertsimas and 
Ye, [31]. 

6.6 In interior-point algorithms, the search direction (D x , d y , D s ) can be determined 
by Newton’s method with three different scalings: primal, dual and primal-dual. 
A primal- scaling (potential reduction) algorithm for semidefinite programming 
is due to Alizadeh [A4, A3] where Yinyu Ye “suggested studying the primal- 
dual potential function for this problem” and “looking at symmetric preserving 
scalings of the form Y~ 1/2 XY~ 1/2 ”, and to Nesterov and Nemirovskii [N2]. A 
dual- scaling algorithm was developed by Benson et al. [25] which exploits the 
sparse structure of the dual SDP. The primal-dual SDP algorithm described here 
is due to Nesterov and Todd [N3] and references therein. 

Efficient interior-point algorithms are also developed for optimization over the 
second-order cone; see Nesterov and Nemirovskii [N2] and Xue and Ye [264]. 
These algorithms have established the best approximation complexity results 
for certain combinatorial location problems. 

The homogeneous and self-dual initialization model was originally developed 
by Ye, Todd and Mizuno for LP [Y2], and for SDP by de Klerk et al. [72], Luo 
et al. [LI 8], and Nesterov et al. [191], and it became the foundational algorithm 
implemented in Sturm [S 1 1] and Andersen [6]. 



Part II 

Unconstrained Problems 




Chapter 7 

Basic Properties of Solutions and Algorithms 



In this chapter we consider optimization problems of the form 

minimize /(x) (7.1) 

subject to x e Q, 

where / is a real- valued function and f 2 , the feasible set, is a subset of E n . 
Throughout most of the chapter attention is restricted to the case where £2 = E n , 
corresponding to the completely unconstrained case, but sometimes we consider 
cases where £2 is some particularly simple subset of E n . 

The first and third sections of the chapter characterize the first- and second-order 
conditions that must hold at a solution point of (7.1). These conditions are simply 
extensions to E n of the well-known derivative conditions for a function of a single 
variable that hold at a maximum or a minimum point. The fourth and fifth sections 
of the chapter introduce the important classes of convex and concave functions that 
provide zeroth-order conditions as well as a natural formulation for a global theory 
of optimization and provide geometric interpretations of the derivative conditions 
derived in the first two sections. 

The final sections of the chapter are devoted to basic convergence characteristics 
of algorithms. Although this material is not exclusively applicable to optimization 
problems but applies to general iterative algorithms for solving other problems as 
well, it can be regarded as a fundamental prerequisite for a modern treatment of 
optimization techniques. Two essential questions are addressed concerning itera- 
tive algorithms. The first question, which is qualitative in nature, is whether a given 
algorithm in some sense yields, at least in the limit, a solution to the original prob- 
lem. This question is treated in Sect. 7.6, and conditions sufficient to guarantee 
appropriate convergence are established. The second question, the more quantita- 
tive one, is related to how fast the algorithm converges to a solution. This question 
is defined more precisely in Sect. 7.7. Several special types of convergence, which 
arise frequently in the development of algorithms for optimization, are explored. 
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7.1 First-Order Necessary Conditions 

Perhaps the first question that arises in the study of the minimization problem (7.1) 
is whether a solution exists. The main result that can be used to address this issue is 
the theorem of Weierstras, which states that if / is continuous and Q is compact, a 
solution exists (see Appendix A. 6). This is a valuable result that should be kept in 
mind throughout our development; however, our primary concern is with character- 
izing solution points and devising effective methods for finding them. 

In an investigation of the general problem (7.1) we distinguish two kinds of 
solution points: local minimum points , and global minimum points. 

Definition. A point x* e Q is said to be a relative minimum point or a local minimum point 
of / over f2 if there is an s > 0 such that /(x) ^ /(x*) for all x e f2 within a distance s of 
x* (that is, x G Q and |x - x*| < s). If /(x) > /(x*) for all x e Q, x^x*, within a distance s 
of x*, then x* is said to be a strict relative minimum point of / over Q. 

Definition. A point x* e Q is said to be a global minimum point of / over £1 if /(x) ^ fix*) 
for all x e fl If fix ) > fix*) for all x e Q, x + x*, then x* is said to be a strict global 
minimum point of / over Q. 

In formulating and attacking problem (7.1) we are, by definition, explicitly ask- 
ing for a global minimum point of / over the set Q. Practical reality, however, both 
from the theoretical and computational viewpoint, dictates that we must in many 
circumstances be content with a relative minimum point. In deriving necessary con- 
ditions based on the differential calculus, for instance, or when searching for the 
minimum point by a convergent stepwise procedure, comparisons of the values of 
nearby points is all that is possible and attention focuses on relative minimum points. 
Global conditions and global solutions can, as a rule, only be found if the problem 
possesses certain convexity properties that essentially guarantee that any relative 
minimum is a global minimum. Thus, in formulating and attacking problem (7.1) 
we shall, by the dictates of practicality, usually consider, implicitly, that we are 
asking for a relative minimum point. If appropriate conditions hold, this will also be 
a global minimum point. 



Feasible Directions 

To derive necessary conditions satisfied by a relative minimum point x\ the basic 
idea is to consider movement away from the point in some given direction. Along 
any given direction the objective function can be regarded as a function of a single 
variable, the parameter defining movement in this direction, and hence the ordinary 
calculus of a single variable is applicable. Thus given xeQwe are motivated to say 
that a vector d is a feasible direction at x if there is an a > 0 such that x + ad e D 
for all a, 0 < a < a. With this simple concept we can state some simple conditions 
satisfied by relative minimum points. 



7.1 First-Order Necessary Conditions 
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Proposition 1 (First-Order Necessary Conditions). Let LI be a subset ofE n and let f e C 1 

be a function on £1 If x* is a relative minimum point of f over then for any d e E n that 

is a feasible direction at x*, we have V/(x*)d ^ 0. 

Proof. For any a , 0 < a < a, the point x(a) = x* + crd e LI. For 0 < a < a define 
the function g(a) = f(x(a)). Then g has a relative minimum at a = 0. A typical g is 
shown in Fig. 7.1. By the ordinary calculus we have 

g(a)-g(0) = g'(0}a + o(a) 9 (7.2) 

where o(a) denotes terms that go to zero faster than a (see Appendix A). If g'(0) < 0 
then, for sufficiently small values of a > 0, the right side of (7.2) will be negative, 
and hence g(a) - g( 0) < 0, which contradicts the minimal nature of g(0). Thus 
g'(0) = V/(x*)d > 0. 1 

A very important special case is where x* is in the interior of D (as would be 
the case if D = E n ). In this case there are feasible directions emanating in every 
direction from x\ and hence V/(x*)d > 0 for all d e E n . This implies V/(x*) = 0. 
We state this important result as a corollary. 

Corollary ( Unconstrained Case). Let LI be a subset of E n , and let f e C 1 be function’ on 
LI. If x* is a relative minimum point of f over L2 and if x* is an interior point of L2, then 
V/(x*) = 0. 

The necessary conditions in the pure unconstrained case lead to n equations 
(one for each component of V/) in n unknowns (the components of x*), which in 
many cases can be solved to determine the solution. In practice, however, as demon- 
strated in the following chapters, an optimization problem is solved directly without 
explicitly attempting to solve the equations arising from the necessary conditions. 
Nevertheless, these conditions form a foundation for the theory. 







Fig. 7.1 Construction for proof 
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Example 1. Consider the problem 



minimize /(x i, X 2 ) = x^ - X\X 2 + x\ - 3x2. 

There are no constraints, so £2 = E 2 . Setting the partial derivatives of / equal to zero 
yields the two equations 



2x\ - xi = 0 
-x\ + 2v 2 = 3. 



These have the unique solution x\ = 1 , X 2 = 2, which is a global minimum point 
of/. 



Example 2. Consider the problem 

minimize /(xi, X 2 ) = x^ - x\ + X 2 + X 1 X 2 
subject to xi >0, X 2 > 0. 



This problem has a global minimum at x\ - X 2 - 0. At this point 



- — = 2xi - 1 + x 2 = 0 

OX\ 



d f _ - _ 3 

— — — 1 + x 1 — — . 

8x2 2 



Thus, the partial derivatives do not both vanish at the solution, but since any 
feasible direction must have an X 2 component greater than or equal to zero, we have 
V/(x*)d > 0 for all d e E 2 such that d is a feasible direction at the point (1/2, 0). 



7.2 Examples of Unconstrained Problems 

Unconstrained optimization problems occur in a variety of contexts, but most 
frequently when the problem formulation is simple. More complex formulations 
often involve explicit functional constraints. However, many problems with con- 
straints are frequently converted to unconstrained problems, such as using the barrier 
functions, e.g., the analytic center problem for (dual) linear programs. We present a 
few more examples here that should begin to indicate the wide scope to which the 
theory applies. 

Example 1 ( Logistic Regression). Recall the classification problem where we have 
vectors a e E d for i = 1,2, ..., ni in a class, and vectors b j e E d for j = 
1,2, . . . , U 2 not. Then we wish to find y e E d and a number such that 

exp(ajy +fS) 

1 +exp(a ( r y + yS) 
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is close to 1 for all i, and 

exp(bjy +/3) 

1 + exp(bjy +0) 

is close to 0 for all j. The problem can be cast as a unconstrained optimization 
problem, called the max-likelihood, 



maximize y? p 



n 



exp(afy +£) 

1 +exp(afy +j3) 




exp(bjy +/?) V 
1 + exp(bjy + fi) J ’ 



which can be also equivalently, using a logarithmic transformation, written as 

minimize^ p 5>g(l+ exp(-afy -yS)) + ^ log ( 1 + exp(b'y + /?)) . 

i J 

Example 2 (Utility Maximization). A common problem in economic theory is the 
determination of the best way to combine various inputs in order to maximize a 
utility function f(x 1 , x 2 , • . . , x n ) (in the monetary unit) of the amounts xj of the 
inputs, i - 1,2, . . . , n. The unit prices of the inputs are pi, p 2 , . . . , p n - The pro- 
ducer wishing to maximize profit must solve the problem 



maximize f(x\, x 2 , . . . , x n ) - p\X\ - p 2 x 2 ... - p n x n - 



The first-order necessary conditions are that the partial derivatives with respect 
to the xC s each vanish. This leads directly to the n equations 



df_ 

Ox, 



(x\, x 2 . 



Xn) = Pi, i = 1,2, 



These equations can be interpreted as stating that, at the solution, the marginal value 
due to a small increase in the ith input must be equal to the price pi. 



Example 3 (Parametric Estimation ). A common use of optimization is for the 
purpose of function approximation. Suppose, for example, that through an exper- 
iment the value of a function g is observed at m points, x\, x 2 , . . . , x m . Thus, values 
g(vi), g(x 2 ), . . . , g(x m ) are known. We wish to approximate the function by a poly- 
nomial 

h(x) = a n x n + a n -\x n ~ l + . . . + ao 

of degree n (or less), where n < m. Corresponding to any choice of the approximating 
polynomial, there will be a set of errors = g(xk) - h(xk ). We define the best 
approximation as the polynomial that minimizes the sum of the squares of these 
errors; that is, minimizes 

m 

k= 1 
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This in turn means that we minimize 



/(a) = ^ \g(Xk) - (a„x£ +a n - l x n k 1 + . . . + a 0 )f 
k= l 

with respect to a = (ao, a \ , . . . , a n ) to find the best coefficients. This is a quadratic 
expression in the coefficients a. To find a compact representation for this objective 

m m . m 

we define qij = 2 bj = Z gO^Ofe) 7 and c = 2 g(xk) 2 . Then after a bit of 

k= 1 fc=l k= 1 

algebra it can be shown that 



/(a) = a r Qa - 2b r a + c 

where Q = [q^], b = (b u b 2 , • • • , b n+ 1 ). 

The first-order necessary conditions state that the gradient of / must vanish. This 
leads directly to the system of n + 1 equations 

Qa = b. 



These can be solved to determine a. 

Example 4 ( Selection Problem). It is often necessary to select an assortment of fac- 
tors to meet a given set of requirements. An example is the problem faced by an 
electric utility when selecting its power-generating facilities. The level of power 
that the company must supply varies by time of the day, by day of the week, and 
by season. Its power-generating requirements are summarized by a curve, h(x), as 
shown in Fig. 7.2a, which shows the total hours in a year that a power level of at 
least v is required for each v. For convenience the curve is normalized so that the 
upper limit is unity. 

The power company may meet these requirements by installing generating equip- 
ment, such as (7. 1) nuclear or (7.2) coal-fired, or by purchasing power from a central 
energy grid. Associated with type i(i = 1, 2) of generating equipment is a yearly 
unit capital cost bi and a unit operating cost q. The unit price of power purchased 
from the grid is C 3 . 

Nuclear plants have a high capital cost and low operating cost, so they are used to 
supply a base load. Coal-fired plants are used for the intermediate level, and power 
is purchased directly only for peak demand periods. The requirements are satisfied 
as shown in Fig. 7.2b, where x\ and x 2 denote the capacities of the nuclear and coal- 
fired plants, respectively. (For example, the nuclear power plant can be visualized 
as consisting of xi/A small generators of capacity A, where A is small. The first 
such generator is on for about h( A) hours, supplying Ah(A) units of energy; the 
next supplies Ah(2A) units, and so forth. The total energy supplied by the nuclear 
plant is thus the area shown.) 
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The total cost is 



r xi 

f(x i, X2) = b\X\ + b2X2 + c 1 I h(x)dx 

Jo 



+^2 






h(x)dx + C3 



f 

X \+X2 



h(x)dx , 




Fig. 7.2 (a) Power requirement curve; (b) X] and x 2 denote the capacities of the nuclear and coal- 
fired plants, respectively 



and the company wishes to minimize this over the set defined by 

X\ > 0, X2 > 0 , X\ + X2 < 1. 

Assuming that the solution is interior to the constraints, by setting the partial 
derivatives equal to zero, we obtain the two equations 

b\ + (c 1 - C2)h(x\) + (c 2 - c 3 )h(xi + X2) = 0 
b 2 + (c 2 - c 3 )h(x 1 + x 2 ) = 0, 

which represent the necessary conditions. 

If x\ = 0, then the general necessary condition theorem shows that the first equal- 
ity could relax to > 0. Likewise, if X 2 = 0, then the second equality could relax to 
^ 0. The case x\ + X 2 = 1 requires a bit more analysis (see Exercise 2). 



7.3 Second-Order Conditions 

The proof of Proposition 1 in Sect. 7.1 is based on making a first-order approx- 
imation to the function / in the neighborhood of the relative minimum point. 
Additional conditions can be obtained by considering higher-order approximations. 
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The second-order conditions, which are defined in terms of the Hessian matrix 
V 2 / of second partial derivatives of / (see Appendix A), are of extreme theoret- 
ical importance and dominate much of the analysis presented in later chapters. 

Proposition 1 (Second-Order Necessary Conditions). Let Q be a subset of E n and let 

f e C 2 be a function on fl If x* is a relative minimum point of f over f2, then for any 
d e E n that is a feasible direction at x* we have 

i) V/(x*)d ^ 0 (7.3) 

ii) if V/(x*)d = 0, then d T V 2 f(x*)d > 0. (7.4) 

Proof. The first condition is just Proposition 1, and the second applies only if 
V/(x*)d = 0. In this case, introducing x(a) = x* + ad and g(a) = f(x(a)) as 
before, we have, in view of g'(0) = 0, 

g(a)-g(0)= ±g"(0)a 2 + o(a 2 ). 

If g"(0) < 0 the right side of the above equation is negative for sufficiently small a 
which contradicts the relative minimum nature of g(0). Thus 

g"(0) = d r V 2 /(x*)d > 0 .1 

Example 1. For the same problem as Example 2 of Sect. 7.1, we have for d = 

Mi, di) 

V/(x*)d = 3 -d 2 . 

Thus condition (ii) of Proposition 1 applies only if J 2 = 0. In that case we have 
d r V 2 /(x*)d = 2 d\ > 0, so condition (ii) is satisfied. 

Again of special interest is the case where the minimizing point is an interior 
point of Q, as, for example, in the case of completely unconstrained problems. 
We then obtain the following classical result. 

Proposition 2 (Second-Order Necessary Conditions — Unconstrained Case). Let x* be 

an interior point of the set f2, and suppose x* is a relative minimum point over f2 of the 
function f e C 2 . Then 



i) V/(x*) = 0 (7.5) 

ii) for all d, d T V 2 f(x*)d > 0. (7.6) 

For notational simplicity we often denote V 2 /(x), the nxn matrix of the second 
partial derivatives of /, the Hessian of /, by the alternative notation F(x). Condi- 
tion (ii) is equivalent to stating that the matrix F(x*) is positive semidefinite. As 
we shall see, the matrix F(x*), which arises here quite naturally in a discussion of 
necessary conditions, plays a fundamental role in the analysis of iterative methods 
for solving unconstrained optimization problems. The structure of this matrix is the 
primary determinant of the rate of convergence of algorithms designed to minimize 
the function /. 
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Example 2. Consider the problem 

minimize /(x i, X 2 ) - x] - x\x 2 + 2x% 
subject to x\ > 0 , X 2 > 0 . 



If we assume that the solution is in the interior of the feasible set, that is, if 
x\ > 0 , X 2 > 0 , then the first-order necessary conditions are 

3x\ - 2 x\X 2 = 0 , -x\ + 4^2 = 0 . 



There is a solution to these at x\ = X 2 = 0 which is a boundary point, but there is 
also a solution at x\ = 6 , X 2 = 9. We note that for x\ fixed at x\ = 6 , the objective 
attains a relative minimum with respect to X 2 at X 2 = 9. Conversely, with X 2 fixed 
at X 2 = 9, the objective attains a relative minimum with respect to x\ at x\ = 6 . 
Despite this fact, the point x\ = 6 , X 2 = 9 is not a relative minimum point, because 
the Hessian matrix is 



6 x 1 - 2 x 2 - 2 xi 
- 2 xi 4 



which, evaluated at the proposed solution xi = 6 , X 2 = 9, is 



F = 



18 -12 
-12 4 ' 



This matrix is not positive semidefinite, since its determinant is negative. Thus the 
proposed solution is not a relative minimum point. 



Sufficient Conditions for a Relative Minimum 

By slightly strengthening the second condition of Proposition 2 above, we obtain a 
set of conditions that imply that the point x* is a relative minimum. We give here 
the conditions that apply only to unconstrained problems, or to problems where the 
minimum point is interior to the feasible region, since the corresponding conditions 
for problems where the minimum is achieved on a boundary point of the feasible 
set are a good deal more difficult and of marginal practical or theoretical value. 
A more general result, applicable to problems with functional constraints, is given 
in Chap. 11. 

Proposition 3 (Second-Order Sufficient Conditions — Unconstrained Case). Let f e C 2 

be function defined on a region in which the point x*is an interior point. Suppose in addition 
that 



i) V/(x*) = 0 (7.7) 

ii) F(x*) is positive definite (7.8) 



Then x* is a strict relative minimum point of /. 
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Proof. Since F(x*) is positive definite, there is an a > 0 such that for all d, d r F(x*) 
d > a|d| 2 . Thus by the Taylor’s Theorem (with remainder) 

/(X* + d) - fix*) = ld r F(x*)d + o(|d| 2 ) 

>(a/ 2)|d| 2 + o(|d| 2 ) 

For small |d| the first term on the right dominates the second, implying that both 
sides are positive for small d. I 



7.4 Convex and Concave Functions 

In order to develop a theory directed toward characterizing global, rather than local, 
minimum points, it is necessary to introduce some sort of convexity assumptions. 
This results not only in a more potent, although more restrictive, theory but also pro- 
vides an interesting geometric interpretation of the second-order sufficiency result 
derived above. 

Definition. A function / defined on a convex set H is said to be convex if, for every Xi , X 2 e 
f2 and every a, 0 < a < 1, there holds 

f(ax 1 + (1 - a)x 2 ) < af(x 0 + (1 - a)f(x 2 ). 

If, for every a, 0 < a < 1, and xi ^ x 2 , there holds 

f(ax i + (1 - a)x 2 ) < af(x 0 + (1 - a)f(x 2 ), 

then / is said to be strictly convex. 

Several examples of convex or nonconvex functions are shown in Fig. 7.3. 
Geometrically, a function is convex if the line joining two points on its graph lies 
nowhere below the graph, as shown in Fig. 7.3a, or, thinking of a function in two 
dimensions, it is convex if its graph is bowl shaped. 

Next we turn to the definition of a concave function. 

Definition. A function g defined on a convex set Q is said to be concave if the function 
/ = —g is convex. The function g is strictly concave if -g is strictly convex. 



Combinations of Convex Functions 

We show that convex functions can be combined to yield new convex functions and 
that convex functions when used as constraints yield convex constraint sets. 

Proposition 1 . Let f\ and f 2 be convex functions on the convex set £X Then the function 
fi + fz is convex on fl 
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Proof. Let xi, X 2 e Q, and 0 < a < 1. Then 

Max i + (1 - a)x 2 ) + fi(axi) + (1 - a)x 2 ) 

< a[/i( Xl ) + / 2 ( Xl )] + (1 - a)[/i(x 2 ) + / 2 (x 2 )]. I 

Proposition 2. Lef f be a convex function over the convex set fl Then the function af is 
convex for any a ^ 0. 

Proof Immediate. I 

Note that through repeated application of the above two propositions it follows 
that a positive combination a\f\ + < 22/2 + . . . + a m f m of convex functions is again 
convex. 

Finally, we consider sets defined by convex inequality constraints. 

Proposition 3. Let f be a convex function on a convex set fl The set T c = (x : x e 
f2, fix) < c] is convex for every real number c. 

Proof Let xi, X 2 e T c . Then /(x 1 ) < c, /(x 2 ) < c and for 0 < a < 1, 
f(ax 1 + (1 - a)x 2 ) < af(xi) + (1 - a)f(x 2 ) < c. 

Thus axi + (1 - a)x 2 € T c . I 

We note that, since the intersection of convex sets is also convex, the set of points 
simultaneously satisfying 

/l(x) < Cl, / 2 (x) < C 2 , . . . , /m(x) < C w , 

where each f is a convex function, defines a convex set. This is important in math- 
ematical programming, since the constraint set is often defined this way. 



Properties of Differentiable Convex Functions 

If a function / is differentiable, then there are alternative characterizations of con- 
vexity. 

Proposition 4. Let f e C 1 . Then f is convex over a convex set f2 if and only if 

f(y)>m + Vf(x)(y-x) (7.9) 

for all x, y eft. 

Proof First suppose / is convex. Then for all a, 0 < a < 1, 
f(ay + (1 - a)x) < af( y) + (1 - a)f(x). 
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Thus for 0 < a < 1 



/(x + a ( y - x)) - /(x) 
a 



</(y)-/(x). 



Letting (t^Owe obtain 



V/(x)( y-x) </(y)-/(x). 

This proves the “only if” part. 

Now assume 

/(y) > /(x) + V/(x) (y-x) 

for all x, y e O. Fix xi, X2 ell and or, 0 < or < 1. Setting x = crxi + (1 - (Ffe and 
alternatively y = xi or y = X2, we have 

/(X l) > /(x) + v/(x)(x 1 - x) (7.10) 

/(x 2 ) > /(x) + V/(x)(x 2 - x). (7.11) 

Multiplying (7.10) by a and (7.1 1) by (1 - tr) and adding, we obtain 

af(x l) + (1 - a)/(x 2 ) > /(x) + V/(x)[axi + (1 - a)x 2 - x]. 

But substituting x = ax i + (1 - a)x 2 , we obtain 

af(x i) + (1 - a)f(x 2 ) > f(ax\ + (1 - o')x 2 ). I 

The statement of the above proposition is illustrated in Fig. 7.4. It can be regarded 
as a sort of dual characterization of the original definition illustrated in Fig. 7.3. 
The original definition essentially states that linear interpolation between two points 
overestimates the function, while the above proposition states that linear approxima- 
tion based on the local derivative underestimates the function. 

For twice continuously differentiable functions, there is another characterization 
of convexity. 




Fig. 7.4 Illustration of Proposition 4 
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Proposition 5. Let f e C 2 . Then f is convex over a convex set f2 containing an interior 
point if and only if the Hessian matrix F off is positive semidefinite throughout fl 

Proof. By Taylor’s theorem we have 

/( y) = /(x) = V/(x)(y - x) + l(y - x) r F(x + a(y - x))(y - x) (7.12) 

for some a, 0 < a < 1 . Clearly, if the Hessian is everywhere positive semidefinite, 
we have 

/(y)>/(x) + V/(x)(y-x), (7.13) 

which in view of Proposition 4 implies that / is convex. 

Now suppose the Hessian is not positive semidefinite at some point x e Q. 
By continuity of the Hessian it can be assumed, without loss of generality, that x 
is an interior point of Q. There is a y e fl such that (y - x) r F(x)(y - x) < 0. Again 
by the continuity of the Hessian, y may be selected so that for all a, 0 < a < 1, 

(y - x) r F(x + a ( y - x)) (y - x) < 0. 

This in view of (7.12) implies that (7.13) does not hold; which in view of Proposi- 
tion 4 implies that / is not convex. I 

The Hessian matrix is the generalization to E n of the concept of the curvature of a 
function, and correspondingly, positive definiteness of the Hessian is the generaliza- 
tion of positive curvature. Convex functions have positive (or at least nonnegative) 
curvature in every direction. Motivated by these observations, we sometimes refer 
to a function as being locally convex if its Hessian matrix is positive semidefinite 
in a small region, and locally strictly convex if the Hessian is positive definite in 
the region. In these terms we see that the second-order sufficiency result of the last 
section requires that the function be locally strictly convex at the point x*. Thus, 
even the local theory, derived solely in terms of the elementary calculus, is actually 
intimately related to convexity — at least locally. For this reason we can view the two 
theories, local and global, not as disjoint parallel developments but as complemen- 
tary and interactive. Results that are based on convexity apply even to nonconvex 
problems in a region near the solution, and conversely, local results apply to a global 
minimum point. 



7.5 Minimization and Maximization of Convex Functions 

We turn now to the three classic results concerning minimization or maximization 
of convex functions. 

Theorem 1. Let f be a convex function defined on the convex set fl. Then the set T where f 
achieves its minimum is convex, and any relative minimum off is a global minimum. 
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Proof. If / has no relative minima the theorem is valid by default. Assume now that 
Co is the minimum of /. Then clearly T = {x : /(x) < Co, xeQ} and this is convex 
by Proposition 3 of the last section. 

Suppose now that x* e LI is a relative minimum point of /, but that there is 
another point yeQ with /( y) < /(x*). On the line ay + (1 - a)x*, 0 < a < 1 we 
have 

/(ay + (1 - a)x*) < af( y) + (1 - a)/(x*) < /(x*), 
contradicting the fact that x* is a relative minimum point. I 

We might paraphrase the above theorem as saying that for convex functions, all 
minimum points are located together (in a convex set) and all relative minima are 
global minima. The next theorem says that if / is continuously differentiable and 
convex, then satisfaction of the first-order necessary conditions are both necessary 
and sufficient for a point to be a global minimizing point. 

Theorem 2. Let f e C 1 be convex on the convex set LI. If there is a point x* e LI such that, 

for all y e £2, V/(x*)(y - x*) > 0, then x* is a global minimum point of f over £2. 

Proof We note parenthetically that since y-x* is a feasible direction at x* , the given 
condition is equivalent to the first-order necessary condition stated in Sect. 7.1. The 
proof of the proposition is immediate, since by Proposition 4 of the last section 

/( y) > /(X) + V/(x*)(y - x*) > f(x*). I 

Next we turn to the question of maximizing a convex function over a convex set. 
There is, however, no analog of Theorem 1 for maximization; indeed, the tendency 
is for the occurrence of numerous nonglobal relative maximum points. Nevertheless, 
it is possible to prove one important result. It is not used in subsequent chapters, 
but it is useful for some areas of optimization. 

Theorem 3. Let f be a convex function defined on the bounded, closed convex set LI. If f 

has a maximum over £2 it is achieved at an extreme point of LI. 

Proof. Suppose / achieves a global maximum at x* e LI. We show first that this 
maximum is achieved at some boundary point of LI. If x* is itself a boundary point, 
then there is nothing to prove, so assume x* is not a boundary point. Let L be any 
line passing through the point x*. The intersection of this line with LI is an interval 
of the line L having end points yi, y 2 which are boundary points of £2, and we have 
x* = oryi + (1 - a)y 2 for some a , 0 < a < 1. By convexity of / 

fix*) < afi yi) + (1 - a)/(y 2 ) < max{/(yi), /( y 2 )}. 

Thus either /(yi) or /(y 2 ) must be at least as great as /(x*). Since x* is a maximum 
point, so is either yi or y 2 . 

We have shown that the maximum, if achieved, must be achieved at a boundary 
point of LI. If this boundary point, x*, is an extreme point of LI there is nothing 
more to prove. If it is not an extreme point, consider the intersection of LI with a 
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supporting hyperplane H at x*. This intersection, T i, is of dimension n - 1 or less 
and the global maximum of / over T\ is equal to /(x*) and must be achieved at a 
boundary point xi of T \ . If this boundary point is an extreme point of T \ , it is also an 
extreme point of Q by Lemma 1, Sect. B.4, and hence the theorem is proved. If xi is 
not an extreme point of 74, we form 74 , the intersection of T\ with a hyperplane in 
E n ~ l supporting T\ at xi . This process can continue at most a total of n times when a 
set T n of dimension zero, consisting of a single point, is obtained. This single point 
is an extreme point of T n and also, by repeated application of Lemma 1, Sect. B.4, 
an extreme point of Q. I 



*7.6 * Zero-Order Conditions 

We have considered the problem 



minimize /(x) 

subject to xgO (7.14) 

to be unconstrained because there are no functional constraints of the form g(x) < b 
or h(x) = c. However, the problem is of course constrained by the set Q. This 
constraint influences the first- and second-order necessary and sufficient conditions 
through the relation between feasible directions and derivatives of the function /. 
Nevertheless, there is a way to treat this constraint without reference to derivatives. 
The resulting conditions are then of zero order. These necessary conditions require 
that the problem be convex is a certain way, while the sufficient conditions require 
no assumptions at all. The simplest assumptions for the necessary conditions are that 
Q is a convex set and that / is a convex function on all of E n . 




Fig. 7.5 The epigraph, the tubular region, and the hyperplane 
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To derive the necessary conditions under these assumptions consider the set T c 
E n+l = {(r, x) : r > /(x), x e E n }. In a figure of the graph of /, the set T is the 
region above the graph, shown in the upper part of Fig. 7.5. This set is called the 
epigraph of /. It is easy to verify that the set T is convex if / is a convex function. 

Suppose that x* e Q is the minimizing point with value f* = /(x*). We construct 
a tubular region with cross section Q and extending vertically from -oo up to /*, 
shown as B in the upper part of Fig. 7.5. This is also a convex set, and it overlaps 
the set T only at the boundary point (/*, b*) above x*(or possibly many boundary 
points if / is flat near x*). 

According to the separating hyperplane theorem (Appendix B), there is a hyper- 
plane separating these two sets. This hyperplane can be represented by a nonzero 
vector of the form (s, A) e E n+l with s a scalar and A e E n , and a separation constant 
c. The separation conditions are 

sr + A t x > c for all x e E n and r > fix) (7.15) 

sr + A t x < c for all x e D and r < /*. (7.16) 

It follows that s ^ 0; for otherwise A ± 0 and then (7.15) would be violated for some 
x € E n . It also follows that s > 0 since otherwise (7.16) would be violated by very 
negative values of r. Hence, together we find s > 0 and by appropriate scaling we 
may take s = 1 . 

It is easy to see that the above conditions can be expressed alternatively as two 
optimization problems, as stated in the following proposition. 

Proposition 1 (Zero-Order Necessary Conditions). If x* solves (7.14) under the stated 
convexity conditions, then there is a nonzero vector A e E n such that x* is a solution to the 
two problems: 

minimize /(x) + Afx 

subject to xeE n (7.17) 

and 

maximize A r x 

subject to x e fl (7.18) 

Proof. Problem (7.17) follows from (7.15) (with s = 1) and the fact that /(x) < r 
for r ^ f(x). The value c is attained from above at (/*, x*). Likewise (7.18) follows 
from (7.16) and the fact that x* and the appropriate r attain c from below. I 

Notice that problem (7.17) is completely unconstrained, since x may range over 
all of E n . The second problem (7.18) is constrained by Q but has a linear objective 
function. It is clear from Fig. 7.5 that the slope of the hyperplane is equal to the 
slope of the function / when / is continuously differentiable at the solution x*. 

If the optimal solution x* is in the interior of Q, then the second problem (7.18) 
implies that A = 0, for otherwise there would be a direction of movement from x* 
that increases the product T r x above T r x*. The hyperplane is horizontal in that case. 
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The zeroth-order conditions provide no new information in this situation. However, 
when the solution is on a boundary point of Q the conditions give very useful infor- 
mation. 

Example 1 ( Minimization Over an Interval). Consider a continuously differentiable 
function / of a single variable x e E l defined on the unit interval [0,1] which plays 
the role of £2 here. The first problem (7.17) implies f'(x*) = -A. If the solution is 
at the left end of the interval (at x = 0) then the second problem (7.18) implies that 
A < 0 which means that /'(x*) > 0. The reverse holds if x* is at the right end. These 
together are identical to the first-order conditions of Sect. 7.1. 

Example 2. As a generalization of the above example, let / e C 1 on E n , and let / 
have a minimum with respect to Q at x*. Let d e E n be a feasible direction at x*. 
Then it follows again from (7.17) that V/(x*)d > 0. 

Sufficient Conditions Theorem. The conditions of Proposition 1 are sufficient for x* to be 
a minimum even without the convexity assumptions. 

Proposition 2 (Zero-Order Sufficiency Conditions). If there is a A such thatx* e Q solves 
the problems (7.17) and (7.18), then x * solves (7.14). 

Proof. Suppose x\ is any other point in O. Then from (7.17) 

fix. l) + A T x 1 > /(x*) + /x*. 



This can be rewritten as 



/(X l) - fix*) ^ A r x* - A t x j. 

By problem (7.18) the right hand side of this is greater than or equal to zero. Hence 
f(x i) - f(x*) > 0 which establishes the result. I 



7.7 Global Convergence of Descent Algorithms 



A good portion of the remainder of this book is devoted to presentation and analysis 
of various algorithms designed to solve nonlinear programming problems. Although 
these algorithms vary substantially in their motivation, application, and detailed 
analysis, ranging from the simple to the highly complex, they have the common 
heritage of all being iterative descent algorithms. By iterative , we mean, roughly, 
that the algorithm generates a series of points, each point being calculated on the 
basis of the points preceding it. By descent , we mean that as each new point is 
generated by the algorithm the corresponding value of some function (evaluated at 
the most recent point) decreases in value. Ideally, the sequence of points generated 
by the algorithm in this way converges in a finite or infinite number of steps to a 
solution of the original problem. 
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An iterative algorithm is initiated by specifying a starting point. If for arbitrary 
starting points the algorithm is guaranteed to generate a sequence of points con- 
verging to a solution, then the algorithm is said to be globally convergent. Quite 
definitely, not all algorithms have this obviously desirable property. Indeed, many of 
the most important algorithms for solving nonlinear programming problems are not 
globally convergent in their purest form and thus occasionally generate sequences 
that either do not converge at all or converge to points that are not solutions. It is 
often possible, however, to modify such algorithms, by appending special devices, 
so as to guarantee global convergence. 

Fortunately, the subject of global convergence can be treated in a unified manner 
through the analysis of a general theory of algorithms developed mainly by Zang- 
will. From this analysis, which is presented in this section, we derive the Global 
Convergence Theorem that is applicable to the study of any iterative descent algo- 
rithm. Frequent reference to this important result is made in subsequent chapters. 



Iterative Algorithms 



We think of an algorithm as a mapping. Given a point x in some space X , the output 
of an algorithm applied to x is a new point. Operated iteratively, an algorithm is 
repeatedly reapplied to the new points it generates so as to produce a whole sequence 
of points. Thus, as a preliminary definition, we might formally define an algorithm A 
as a mapping taking points in a space X into (other) points in X. Operated iteratively, 
the algorithm A initiated at xq € X would generate the sequence {x^} defined by 



x k+ i = A(xfc). 



In practice, the mapping A might be defined explicitly by a simple mathematical 
expression or it might be defined implicitly by, say, a lengthy complex computer 
program. Given an input vector, both define a corresponding output. 

With this intuitive idea of an algorithm in mind, we now generalize the concept 
somewhat so as to provide greater flexibility in our analyses. 

Definition. An algorithm A is a mapping defined on a space X that assigns to every point 

x e X a subset of X. 

In this definition the term “space” can be interpreted loosely. Usually X is the 
vector space E n but it may be only a subset of E n or even a more general metric 
space. The most important aspect of the definition, however, is that the mapping A, 
rather than being a point-to-point mapping of X , is a point-to-set mapping of X. 

An algorithm A generates a sequence of points in the following way. Given 
x^ e X the algorithm yields A(xQ which is a subset of X. From this subset an ar- 
bitrary element x^+i is selected. In this way, given an initial point xo, the algorithm 
generates sequences through the iteration 



Xi+l G A(Xfc). 
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It is clear that, unlike the case where A is a point-to-point mapping, the sequence 
generated by the algorithm A cannot, in general, be predicted solely from knowledge 
of the initial point xo. This degree of uncertainty is designed to reflect uncertainty 
that we may have in practice as to specific details of an algorithm. 

Example 1. Suppose for v on the real line we define 

A(x) = [-|x|/2, M/2] 

so that A(x) is an interval of the real line. Starting at vo = 100, each of the sequences 
below might be generated from iterative application of this algorithm. 

100, 50, 25, 12, -6, -2, 1, 1/2,... 

100, -40, 20, -5, -2, 1, 1/4, 1/8,... 

100, 10, -1,1/16,1/100, -1/1000, 1/10, 100,... 

The apparent ambiguity that is built into this definition of an algorithm is not meant 
to imply that actual algorithms are random in character. In actual implementation 
algorithms are not defined ambiguously. Indeed, a particular computer program 
executed twice from the same starting point will generate two copies of the same 
sequence. In other words, in practice algorithms are point-to-point mappings. The 
utility of the more general definition is that it allows one to analyze, in a single step, 
the convergence of an infinite family of similar algorithms. Thus, two computer pro- 
grams, designed from the same basic idea, may differ slightly in some details, and 
therefore perhaps may not produce identical results when given the same starting 
point. Both programs may, however, be regarded as implementations of the same 
point-to-set mappings. In the example above, for instance, it is not necessary to 
know exactly how Xk+i is determined from Xk so long as it is known that its absolute 
value is no greater than one-half xfs absolute value. The result will always tend to- 
ward zero. In this manner, the generalized concept of an algorithm sometimes leads 
to simpler analysis. 



Descent 

In order to describe the idea of a descent algorithm we first must agree on a subset 
T of the space X , referred to as the solution set. The basic idea of a descent function, 
which is defined below, is that for points outside the solution set, a single step of the 
algorithm yields a decrease in the value of the descent function. 

Definition. Let T c X be a given solution set and let A be an algorithm on X. A continuous 
real-valued function Z on X is said to be a descent function for T and A if it satisfies 

i) if x g T and y e A(x), then Z(y) < Z(x) 

ii) if x G T and y e A(x), then Z(y) < Z(x). 
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There are a number of ways a solution set, algorithm, and descent function can 
be defined. A natural set-up for the problem 

minimize /(x) (7.19) 

subject to x e Q 

is to let T be the set of minimizing points, and define an algorithm A on £2 in such a 
way that / decreases at each step and thereby serves as a descent function. Indeed, 
this is the procedure followed in a majority of cases. Another possibility for uncon- 
strained problems is to let T be the set of points x satisfying V/(x) = 0. In this case 
we might design an algorithm for which |V/(x)| serves as a descent function or for 
which /(x) serves as a descent function. 



* Closed Mappings 

An important property possessed by some algorithms is that they are closed. This 
property, which is a generalization for point-to-set mappings of the concept of con- 
tinuity for point-to-point mappings, turns out to be the key to establishing a gen- 
eral global convergence theorem. In defining this property we allow the point-to-set 
mapping to map points in one space X into subsets of another space Y. 

Definition. A point-to-set mapping A from X to Y is said to be closed at x e X if the 
assumptions 

i) X£ — > x, X£ G X, 

ii) y k -> y, y, * e A(x*) 

imply 

hi) y e A(x). 



v v 




closed not dosed 

Fig. 7.6 Graphs of mappings 

The point-to-set map A is said to be closed on X if it is closed at each point of X. 
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Example 2. As a special case, suppose that the mapping A is a point-to-point map- 
ping; that is, for each x e X the set A(x) consists of a single point in Y. Suppose also 
that A is continuous at x e X. This means that if x^ — > x then A(x^) — > A(x), and 
it follows that A is closed at x. Thus for point-to-point mappings continuity implies 
closedness. The converse is, however, not true in general. 

The definition of a closed mapping can be visualized in terms of the graph of the 
mapping, which is the set {(x, y) : x e X, ye A(x)}. If X is closed, then A is closed 
throughout X if and only if this graph is a closed set. This is illustrated in Fig. 7.6. 
However, this equivalence is valid only when considering closedness everywhere. 
In general a mapping may be closed at some points and not at others. 

Example 3. The reader should verify that the point- to- set mapping defined in 
Example 1 is closed. 

Many complex algorithms that we analyze are most conveniently regarded as the 
composition of two or more simple point-to-set mappings. It is therefore natural to 
ask whether closedness of the individual maps implies closedness of the composite. 
The answer is a qualified “yes.” The technical details of composition are described 
in the remainder of this subsection. They can safely be omitted at first reading while 
proceeding to the Global Convergence Theorem. 

Definition. Let A : X — > Y and B : Y — > Z be point-to-set mappings. The composite 

mapping C = BA is defined as the point-to-set mapping C : X — > Z with 

C(x) = \J B(y). 

yeA(x) 



This definition is illustrated in Fig. 7.7. 

Proposition. Let A : X — > Y and B : Y — > Z be point-to-set mappings. Suppose A is closed 
at x and B is closed on A(x). Suppose also that ifx^^x and y^ e A(x^), there is a y such 
that, for some subsequence {y&}, y & — > y. Then the composite mapping C = BA is closed 
at x. 



Proof. Let x^ — > x and — > z with z^ e C(x^). It must be shown that z e C(x). 

Select yk e A(x*) such that z^ e B(y^) and according to the hypothesis let y and 
{y ki) be such that y gi — > y. Since A is closed at x it follows that y e A(x). 

Likewise, since y & — > y, z ;&• — > z and B is closed at y, it follows that z e B(y) c 
BA(x) = C(x). I 

Two important corollaries follow immediately. 

Corollary 1. Let A : X — > Y and B : Y ^ Z be point-to-set mappings. If A is closed at x, B 
is closed on A(x) and Y is compact, then the composite map C = BA is closed at x. 

Corollary 2. Let A : X — > Y be a point-to-point mapping and B : 7 — > Z a point-to- 
set mapping. If A is continuous at x and B is closed at A(x), then the composite mapping 
C = BA is closed at x. 
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Fig. 7.7 Composition of mappings 



Global Convergence Theorem 

The Global Convergence Theorem is used to establish convergence for the follow- 
ing general situation. There is a solution set T. Points are generated according to 
the algorithm x^+i e A(x^), and each new point always strictly decreases a descent 
function Z unless the solution set T is reached. For example, in nonlinear program- 
ming, the solution set may be the set of minimum points (perhaps only one point), 
and the descent function may be the objective function itself. A suitable algorithm 
is found that generates points such that each new point strictly reduces the value of 
the objective. Then, under appropriate conditions, it follows that the sequence con- 
verges to the solution set. The Global Convergence Theorem establishes technical 
conditions for which convergence is guaranteed. 

Global Convergence Theorem. Let A be an algorithm on X, and suppose that, given xo the 
sequence {x^}^ 0 is generated satisfying 

X k+ i £ A(x k ). 

Let a solution setY cl be given, and suppose 

i ) all points x k are contained in a compact set S cl 

ii) there is a continuous function Z on X such that 

(a) ifx £ T, then Z(y) < Z(x)for all y £ A(x) 

(b) ifx £ T, then Z(y) < Z(x)for all y £ A(x) 

iii) the mapping A is closed at points outside Y. 

Then the limit of any convergent subsequence of{x k } is a solution. 

Proof. Suppose the convergent subsequence {x^}, k e converges to the limit x. 
Since Z is continuous, it follows that for k e 7C, Z(x k ) — > Z(x). This means that Z is 
convergent with respect to the subsequence, and we shall show that it is convergent 
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with respect to the entire sequence. By the monotonicity of Z on the sequence {x^} 
we have Z(x^) - Z(x) > 0 for all k. By the convergence of Z on the subsequence, 
there is, for a given s > 0, a K e such that Z(x&) - Z(x) < £ for all k > K, k e 7C. 

Thus for all k > K 

Z(x k ) - Z(x) = Z(x*) - Z(x/j) + Z(xjc) - Z(x) < e, 
which shows that Z(x&) — > Z(x). 

To complete the proof it is only necessary to show that x is a solution. Sup- 
pose x is not a solution. Consider the subsequence {x^+iJtc. Since all members of 
this sequence are contained in a compact set, there is a EC c 7C such that {x^+i}^ 
converges to some limit x. We thus have x^ —> x, k e EC, and x^+i 6 A(x&) with 
X£+i —> x, k e EC. Thus since A is closed at x it follows that x e A(x). But from 
above, Z(x) = Z(x) which contradicts the fact that Z is a descent function. I 

Corollary. If under the conditions of the Global Convergence Theorem T consists of a 

single point % then the sequence {x^} converges to x. 

Proof. Suppose to the contrary that there is a subsequence {x^}^ and an s > 0 such 
that |xfc - x| > s for all k e %. By compactness there must be 7C c 7C such that 
{x^}< 7 C', converges, say to x'. Clearly, |x' - x| > s, but by the Global Convergence 
Theorem x' e T, which is a contradiction. I 

In later chapters the Global Convergence Theorem is used to establish the con- 
vergence of several standard algorithms. Here we consider some simple examples 
designed to illustrate the roles of the various conditions of the theorem. 

Example 4. In many respects condition (iii) of the theorem, the closedness of A out- 
side the solution set, is the most important condition. The failure of many popular 
algorithms can be traced to nonsatisfaction of this condition. On the real line con- 
sider the point-to-point algorithm 



(Ux- 1)+ 1 x> 1 
A(x) = 2 

[^X X < 1 

and the solution set T = {0}. It is easily verified that a descent function for this 
solution set and this algorithm is Z{x) - \x\. However, starting from v > 1, the 
algorithm generates a sequence converging to v = 1 which is not a solution. The 
difficulty is that A is not closed at v = 1 . 

Example 5. On the real line X consider the solution set to be empty, the descent 
function Z(x) = e ~ x , and the algorithm A(x) = x + 1. All conditions of the conver- 
gence theorem except (i) hold. The sequence generated from any starting condition 
diverges to infinity. This is not strictly a violation of the conclusion of the theorem 
but simply an example illustrating that if no compactness assumption is introduced, 
the generated sequence may have no convergent subsequence. 
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Example 6. Consider the point-to-set algorithm A defined by the graph in Fig. 7.8 
and given explicitly on X = [0, 1] by 



A(x) = 



[0, x) 1 ^ x > 0 

0 * = 0 , 



where [0, x) denotes a half-open interval (see Appendix A). Letting T = {0}, the 
function Z{x) — x serves as a descent function, because for v A 0 all points in A(x) 
are less than v. 



A u) 




Fig. 7.8 Graph for Example 6 



The sequence defined by 



v 0 = 1 



%k+l ~ 



2 k+2 



satisfies Xk+i € A(xk) but it can easily be seen that Xk — > \ i T. The difficulty here, 
of course, is that the algorithm A is not closed outside the solution set. 



* Spacer Steps 

In some of the more complex algorithms presented in later chapters, the rule used to 
determine a succeeding point in an iteration may depend on several previous points 
rather than just the current point, or it may depend on the iteration index k. Such 
features are generally introduced in order to obtain a rapid rate of convergence but 
they can grossly complicate the analysis of global convergence. 
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If in such a complex sequence of steps there is inserted, perhaps irregularly but 
infinitely often, a step of an algorithm such as steepest descent that is known to 
converge, then it is not difficult to insure that the entire complex process converges. 
The step which is repeated infinitely often and guarantees convergence is called a 
spacer step, since it separates disjoint portions of the complex sequence. Essentially 
the only requirement imposed on the other steps of the process is that they do not 
increase the value of the descent function. 

This type of situation can be analyzed easily from the following viewpoint. 
Suppose B is an algorithm which together with the descent function Z and solu- 
tion set T, satisfies all the requirements of the Global Convergence Theorem. Define 
the algorithm C by C(x) = {y : Z(y) < Z(x)}. In other words, C applied to x can 
give any point so long as it does not increase the value of Z. It is easy to verify that 
C is closed. We imagine that B represents the spacer step and the complex process 
between spacer steps is just some realization of C. Thus the overall process amounts 
merely to repeated applications of the composite algorithm CB. With this viewpoint 
we may state the Spacer Step Theorem. 

Spacer Step Theorem. Suppose B is an algorithm on X which is closed outside the solution 
set T. Let Z be a descent function corresponding to B and T. 

Suppose that the sequence {x^}“ 0 is generated satisfying 

x k+l e B(x k ) 



for k in an infinite index set 7C, and that 



Z(x*+i) < Z(Xfc) 

for all k. Suppose also that the set S = {x : Z(x) < Z(xq)} is compact. Then the limit of any 
convergent subsequence of {x k }% is a solution. 

Proof. We first define for any xeX,B(x) = S fiB(x) and then observe that A = CB 
is closed outside the solution set by Corollary 1 . The Global Convergence Theorem 
can then be applied to A. Since S is compact, there is a subsequence of {*k}keK 
converging to a limit x. In view of the above we conclude that x e T.l 



7.8 Speed of Convergence 

The study of speed of convergence is an important but sometimes complex subject. 
Nevertheless, there is a rich and yet elementary theory of convergence rates that 
enables one to predict with confidence the relative effectiveness of a wide class of 
algorithms. In this section we introduce various concepts designed to measure speed 
of convergence, and prepare for a study of this most important aspect of nonlinear 
programming. 



7.8 Speed of Convergence 

Order of Convergence 
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Consider a sequence of real numbers converging to the limit r*. We define 

several notions related to the speed of convergence of such a sequence. 



Definition. Let the sequence {?>} converge to r* . The order of convergence of {r^} is defined 
as the supremum of the nonnegative numbers p satisfying 



0 < lim 

k — >oo 



kt+i - '•"I 
k-rT 



< OO. 



To ensure that the definition is applicable to any sequence, it is stated in terms 
of limit superior rather than just limit and 0/0 (which occurs if r \ = r* for all k ) 
is regarded as finite. But these technicalities are rarely necessary in actual analysis, 
since the sequences generated by algorithms are generally quite well behaved. 

It should be noted that the order of convergence, as with all other notions related 
to speed of convergence that are introduced, is determined only by the properties 
of the sequence that hold as k — > oo. Somewhat loosely but picturesquely, we are 
therefore led to refer to the tail of a sequence — that part of the sequence that is 
arbitrarily far out. In this language we might say that the order of convergence is a 
measure of how good the worst part of the tail is. Larger values of the order p imply, 
in a sense, faster convergence, since the distance from the limit r* is reduced, at least 
in the tail, by the pth power in a single step. Indeed, if the sequence has order p and 
(as is the usual case) the limit 






\r k+ i-r*\ 
lim — 

k^oo | rk — r* \ p 



exists, then asymptotically we have 



lk+i - r * | =/5\r k - r*\ p . 



Example 1. The sequence with r k = a k where 0 < a < 1 converges to zero with 
order unity, since rk+i/n = a. 

Example 2. The sequence with r k = a for 0 < a < 1 converges to zero with order 
two, since r k + x /r\ = 1. 



Linear Convergence 



Most algorithms discussed in this book have an order of convergence equal to unity. 
It is therefore appropriate to consider this class in greater detail and distinguish 
certain cases within it. 



Definition. If the sequence {?>} converges to r* in such a way that 



lim 

k^oo 



\n+ 1 - r*\ 
\n-r* I 






the sequence is said to converge linearly to r* with convergence ratio (or rate ) 
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Linear convergence is, for our purposes, without doubt the most important type 
of convergence behavior. A linearly convergent sequence, with convergence ratio / 3 , 
can be said to have a tail that converges at least as fast as the geometric sequence 
c[3 k for some constant c. Thus linear convergence is sometimes referred to as geo- 
metric convergence , although in this book we reserve that phrase for the case when 
a sequence is exactly geometric. 

As a rule, when comparing the relative effectiveness of two competing algorithms 
both of which produce linearly convergent sequences, the comparison is based on 
their corresponding convergence ratios — the smaller the ratio the faster the rate. 
The ultimate case where [3 = 0 is referred to as superlinear convergence. We note 
immediately that convergence of any order greater than unity is superlinear, but it is 
also possible for superlinear convergence to correspond to unity order. 

Example 3. The sequence r \ = (l/k) k is of order unity, since r^+i/r^ — > oo for p > 1. 
However, r^+i/r^ —> 0 as k —> oo and hence this is superlinear convergence. 



Arithmetic Convergence 

Linear convergence is also called geometric convergence. There is another (slower) 
type of convergence: 

Definition. If the sequence {r k } converges to r* in such a way that 

| r k - r * | < C^- 1, k > 1, 0 < p < oo 

kP 

where C is a fixed positive number, the sequence is said to converge arithmetically to r* 
with order p. 

When p - 1, it is referred as arithmetic convergence. The greater of p the faster of 
the convergence. 

Example 4. The sequence r k = l/k converges to zero arithmetically. The conver- 
gence is of order one but it is not linear, since lim(r^+i/r^) = 1, that is, ft is not 

»oo 

strictly less than one. 



* Average Rates 



All the definitions given above can be referred to as step-wise concepts of conver- 
gence, since they define bounds on the progress made by going a single step: from k 
to k + 1 . Another approach is to define concepts related to the average progress per 
step over a large number of steps. We briefly illustrate how this can be done. 
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Definition. Let the sequence \r k ) converge to r*. The average order of convergence is the 
infimum of the numbers p > 1 such that 

lim | r k - r*\ l/pk = 1. 

k— >oo 

The order is infinity if the equality holds for no p > 1 . 

Example 5. For the sequence r k = a^ 2k \ 0 < a < 1, given in Example 2, we have 

\n\ 1/2k = a, 

while 

\r k \ l/pk = a (2/p)k -> 1 

for p > 2. Thus the average order is two. 

Example 6. For r k = a k with 0 < a < 1 we have 

(r k ) 1/pk = a k(1/p)k -> 1 



for any p > 1 . Thus the average order is unity. 

As before, the most important case is that of unity order, and in this case we 
define the average convergence ratio as lim \r k - r*\ l ! k . Thus for the geometric 

k — too 

sequence r k = ca k , 0 < a < 1, the average convergence ratio is a. Paralleling 
the earlier definitions, the reader can then in a similar manner define corresponding 
notions of average linear and average superlinear convergence. 

Although the above array of definitions can be further embellished and expanded, 
it is quite adequate for our purposes. For the most part we work with the step-wise 
definitions, since in analyzing iterative algorithms it is natural to compare one step 
with the next. In most situations, moreover, when the sequences are well behaved 
and the limits exist in the definitions, then the step-wise and average concepts of 
convergence rates coincide. 



* Convergence of Vectors 

Suppose {x*}~ 0 is a sequence of vectors in E n converging to a vector x*. The con- 
vergence properties of such a sequence are defined with respect to some particular 
function that converts the sequence of vectors into a sequence of numbers. Thus, 
if / is a given continuous function on E n , the convergence properties of {x^} can 
be defined with respect to / by analyzing the convergence of f(x k ) to /(x*). The 
function / used in this way to measure convergence is called the error function. 

In optimization theory it is common to choose the error function by which to 
measure convergence as the same function that defines the objective function of the 
original optimization problem. This means we measure convergence by how fast the 




208 



7 Basic Properties of Solutions and Algorithms 



objective converges to its minimum, alternatively, we sometimes use the function 
|x - x*| 2 and thereby measure convergence by how fast the (squared) distance from 
the solution point decreases to zero. 

Generally, the order of convergence of a sequence is insensitive to the particular 
error function used; but for step-wise linear convergence the associated convergence 
ratio is not. Nevertheless, the average convergence ratio is not too sensitive, as the 
following proposition demonstrates, and hence the particular error function used to 
measure convergence is not really very important. 

Proposition. Let f and g be two error functions satisfying /(x*) = g(x*) = 0 and, for all x, 
a relation of the form 

0 < aig(x) < f(x) < a 2 g(x) 

for some fixed a\ > 0, a 2 > 0. If the sequence {x^}^ 0 converges to x* linearly with average 
ratio p with respect to one of these functions, it also does so with respect to the other. 

Proof. The statement is easily seen to be symmetric in / and g. Thus we assume 
{x^} is linearly convergent with average convergence ratio y 3 with respect to /, and 
will prove that the same is true with respect to g. We have 



/? = lim f(x k ) ]/k < lim afg(x k ) llk 

AC— »oo »oo 



lim g(x k ) 1/k 



and 

m) i/k > a\ ,k g (x k ) i/k = ns g (x k ) i/k . 

/c— >oo /c— >oo k — mx> 



Thus 

P = iiS g(x k ) 1/k . I 

/c— MX) 

As an example of an application of the above proposition, consider the case 
where g(x) = |x - x*| 2 and /(x) = (x - x*) r Q(x - x*), where Q is a positive defi- 
nite symmetric matrix. Then a\ and a i correspond, respectively, to the smallest and 
largest eigenvalues of Q. Thus average linear convergence is identical with respect 
to any error function constructed from a positive definite quadratic form. 



Complexity 



Complexity theory as outlined in Sect. 5.1 is an important aspect of convergence 
theory. This theory can be used in conjunction with the theory of local convergence. 
If an algorithm converges according to any order greater than zero, then for a fixed 
problem, the sequence generated by the algorithm will converge in a time that is a 
function of the convergence order (and rate, if convergence is linear). For example, 
if the order is one with rate 0 < c < 1 and the process begins with an error of R , 
a final error of r can be achieved by a number of steps n satisfying c n R < r. Thus 
it requires approximately n = 1 og(R/r)/ log(l/c) steps. In this form the number of 
steps is not affected by the size of the problem. However, problem size enters in 
two possible ways. First, the rate c may depend on the size-say going toward 1 as 
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the size increases so that the speed is slower for large problems. The second way 
that size may enter, and this is the more important way, is that the time to exe- 
cute a single step almost always increases with problem size. For instance if, for a 
problem seeking an optimal vector of dimension m, each step requires a Gaussian 
elimination inversion of an m x m matrix, the solution time will increase by a factor 
proportional to m 3 . Overall the algorithm is therefore a polynomial time algorithm. 
Essentially all algorithms in this book employ steps, such as matrix multiplications 
or inversion or other algebraic operations, which are polynomial- time in character. 
Convergence analysis, therefore, focuses on whether an algorithm is globally con- 
vergent, on its local convergence properties, and also on the order of the algebraic 
operations required to execute the steps required. The last of these is usually easily 
deduced by listing the number and size of the required vector and matrix operations. 



7.9 Summary 



There are two different but complementary ways to characterize the solution to 
unconstrained optimization problems. In the local approach, one examines the re- 
lation of a given point to its neighbors. This leads to the conclusion that, at an 
unconstrained relative minimum point of a smooth function, the gradient of the 
function vanishes and the Hessian is positive semidefinite; and conversely, if at a 
point the gradient vanishes and the Hessian is positive definite, that point is a rel- 
ative minimum point. This characterization has a natural extension to the global 
approach where convexity ensures that if the gradient vanishes at a point, that point 
is a global minimum point. 

In considering iterative algorithms for finding either local or global minimum 
points, there are two distinct issues: global convergence properties and local con- 
vergence properties. The first is concerned with whether starting at an arbitrary 
point the sequence generated will converge to a solution. This is ensured if the 
algorithm is closed, has a descent function, and generates a bounded sequence. It 
is also explained that global convergence is guaranteed simply by the inclusion, in 
a complex algorithm, of spacer steps. This result is called upon frequently in what 
follows. Local convergence properties are a measure of the ultimate speed of con- 
vergence and generally determine the relative advantage of one algorithm to another. 



7.10 Exercises 

1 . To approximate a function g over the interval [0, 1] by a polynomial p of degree 
n (or less), we minimize the criterion 

/(a) = f [, g(x) - p{x)fdx, 

Jo 

where p(x) = a n x n + a n - \x n ~ l + . . . + ao. Find the equations satisfied by the 
optimal coefficients a = (ao, a\, . . . , a n ). 
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2. In Example 4 of Sect. 7.2 show that if the solution has x\ > 0, x\ + x 2 - 1, then 
it is necessary that 

bi-b 2 + (> ci - c 2 )h(x i) = 0 
b 2 + (c 2 - c 2 )h(x\ + x 2 ) < 0. 

Hint : One way is to reformulate the problem in terms of the variables x\ and 
y = *i + * 2 . 

3. (a) Using the first-order necessary conditions, find a minimum point of the 

function 

f(x, y, z) = 2x 2 + xy + y 2 + yz + z 2 - 6x - ly - 8z + 9. 

(b) Verify that the point is a relative minimum point by verifying that the 
second-order sufficiency conditions hold. 

(c) Prove that the point is a global minimum point. 

4. In this exercise and the next we develop a method for determining whether a 
given symmetric matrix is positive definite. Given an n x n matrix A let 
denote the principal submatrix made up of the first k rows and columns. Show 
(by induction) that if the first n— 1 principal submatrices are nonsingular, then 
there is a unique lower triangular matrix L with unit diagonal and a unique 
upper triangular matrix U such that A = LU. (See Appendix C.) 

5. A symmetric matrix is positive definite if and only if the determinant of each 
of its principal submatrices is positive. Using this fact and the considerations of 
Exercise 4, show that an n xn symmetric matrix A is positive definite if and only 
if it has an LU decomposition (without interchange of rows) and the diagonal 
elements of U are all positive. 

6. Using Exercise 5 show that an nxn matrix A is symmetric and positive definite 
if and only if it can be written as A = GG r where G is a lower triangular matrix 
with positive diagonal elements. This representation is known as the Cholesky 
factorization of A. 

7. Let fj , i e I be a collection of convex functions defined on a convex set Q. 

Show that the function / defined by /(x) = sup/(x) is convex on the region 

/g/ 

where it is finite. 

8. Let y be a monotone nondecreasing function of a single variable (that is, y(r) < 
y(r') for r' > r) which is also convex; and let / be a convex function defined 
on a convex set Q. Show that the function y(f) defined by y(/)(x) = y[/(x)] is 
convex on Q. 

9. Let / be twice continuously differentiable on a region Q c E n . Show that a 
sufficient condition for a point x* in the interior of D to be a relative minimum 
point of / is that V/(x*) = 0 and that / be locally convex at x*. 
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10. Define the point- to- set mapping on E n by 

A(x) = {y : y T x < b], 
where b is a fixed constant. Is A closed? 

11. Prove the two corollaries in Sect. 7.6 on the closedness of composite mappings. 

12. Show that if A is a continuous point-to-point mapping, the Global Conver- 
gence Theorem is valid even without assumption (i). Compare with Example 2, 
Sect. 7.7. 

13. Let {r^}^2 0 and {ck}™ =0 be sequences of real numbers. Suppose rk —> 0 average 
linearly and that there are constants c > 0 and C such that c < Ck < C for all k. 
Show that CkTk — > 0 average linearly. 

14. Prove a proposition, similar to the one in Sect. 7.8, showing that the order of 
convergence is insensitive to the error function. 

15. Show that if rk —> r* (step-wise) linearly with convergence ratio /?, then — > 

r* (average) linearly with average convergence ratio no greater than (3. 
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gence Theorem, which captures the essence of many previously diverse 
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7.8 Most of the definitions given in this section have been standard for quite 
some time. A thorough discussion which contributes substantially to the 
unification of these concepts is contained in Ortega and Rheinboldt [07]. 



Chapter 8 

Basic Descent Methods 



We turn now to a description of the basic techniques used for iteratively solving 
unconstrained minimization problems. These techniques are, of course, important 
for practical application since they often offer the simplest, most direct alternatives 
for obtaining solutions; but perhaps their greatest importance is that they establish 
certain reference plateaus with respect to difficulty of implementation and speed 
of convergence. Thus in later chapters as more efficient techniques and techniques 
capable of handling constraints are developed, reference is continually made to the 
basic techniques of this chapter both for guidance and as points of comparison. 

There is a fundamental underlying structure for almost all the descent algorithms 
we discuss. One starts at an initial point; determines, according to a fixed rule, a 
direction of movement; and then moves in that direction to a (relative) minimum of 
the objective function on that line. At the new point a new direction is determined 
and the process is repeated. The primary differences between algorithms (steepest 
descent, Newton’s method, etc.) rest with the rule by which successive directions of 
movement are selected. Once the selection is made, all algorithms call for movement 
to the minimum point on the corresponding line. 

The process of determining the minimum point on a given line (one variable 
only) is called line search. For general nonlinear functions that cannot be minimized 
analytically, this process actually is accomplished by searching, in an intelligent 
manner, along the line for the minimum point. These line search techniques, which 
are really procedures for solving one-dimensional minimization problems, form the 
backbone of nonlinear programming algorithms, since higher dimensional problems 
are ultimately solved by executing a sequence of successive line searches. There are 
a number of different approaches to this important phase of minimization and the 
first half of this chapter is devoted to their, discussion. 

The last sections of the chapter are devoted to a description and analysis of 
the basic descent algorithms for unconstrained problems; steepest descent, coor- 
dinate descent, and Newton’s method. These algorithms serve as primary models 
for the development and analysis of all others discussed in the book. 
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8 Basic Descent Methods 



8.1 Line Search Algorithms 

These algorithms are classified by the order of information of the objective functions 
f(x) being evaluated. 



Oth- Order Method: Golden Section Search and Curve Fitting 

A very popular method for resolving the line search problem is the Fibonacci search 
method described in this section. The method has a certain degree of theoretical 
elegance, which no doubt partially accounts for its popularity, but on the whole, as 
we shall see, there are other procedures which in most circumstances are superior. 

The method determines the minimum value of a function / over a closed interval 
[ci, c{\. In applications, / may in fact be defined over a broader domain, but for 
this method a fixed interval of search must be specified. The only property that is 
assumed of / is that it is unimodal , that is, it has a single relative minimum (see 
Fig. 8.1). The minimum point of / is to be determined, at least approximately, by 
measuring the value of / at a certain number of points. It should be imagined, as is 
indeed the case in the setting of nonlinear programming, that each measurement of 
/is somewhat costly — of time if nothing more. 

To develop an appropriate search strategy, that is, a strategy for selecting mea- 
surement points based on the previously obtained values, we pose the following 
problem: Find how to successively select N measurement points so that, without 
explicit knowledge of /, we can determine the smallest possible region of uncer- 
tainty in which the minimum must lie. In this problem the region of uncertainty is 
determined in any particular case by the relative values of the measured points in 
conjunction with our assumption that / is unimodal. Thus, after values are known 
at N points x\, X 2 , . . . , xjy with 



C\ < X\ < X2 . . . < Xjv-1 < Xn < C2, 



the region of uncertainty is the interval [xk-u Xk+i\ where x k is the minimum point 
among the N , and we define xo = c\, v^v+i = C 2 for consistency. The minimum of / 
must lie somewhere in this interval. 

The derivation of the optimal strategy for successively selecting measurement 
points to obtain the smallest region of uncertainty is fairly straight-forward but 
somewhat tedious. We simply state the result and give an example. 

Let 



d\ = C 2 - ci, the initial width of uncertainty 
dk = width of uncertainty after k measurements 
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/ 




x 



Fig. 8.1 A unimodal function 



Then, if a total of N measurements are to be made, we have 




du 



( 8 . 1 ) 



where the integers F \ are members of the Fibonacci sequence generated by the 
recurrence relation 



The resulting sequence is 1, 1, 2, 3, 5, 8, 13, ... . 

The procedure for reducing the width of uncertainty to d N is this: The first two 
measurements are made symmetrically at a distance of {Fn-\I F N )d\ from the ends 
of the initial intervals; according to which of these is of lesser value, an uncertainty 
interval of width J2 = ( F N -i/F N )d\ is determined. The third measurement point is 
placed symmetrically in this new interval of uncertainty with respect to the measure- 
ment already in the interval. The result of this third measurement gives an interval 
of uncertainty J3 = (F N - 2 /F N )d\. In general, each successive measurement point 
is placed in the current interval of uncertainty symmetrically with the point already 
existing in that interval. 

Some examples are shown in Fig. 8.2. In these examples the sequence of mea- 
surement points is determined in accordance with the assumption that each measure- 
ment is of lower value than its predecessors. Note that the procedure always calls 
for the last two measurements to be made at the midpoint of the semifinal interval of 
uncertainty. We are to imagine that these two points are actually separated a small 
distance so that a comparison of their respective values will reduce the interval to 
nearly half. This terminal anomaly of the Fibonacci search process is, of course, of 
no great practical consequence. 



Fn = F N - 1 + F N - 2, F{) = F\ = 1. 



( 8 . 2 ) 
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Search by Golden Section 

If the number N of allowed measurement points in a Fibonacci search is made to 
approach infinity, we obtain the golden section method. It can be argued, based on 
the optimal property of the finite Fibonacci method, that the corresponding infinite 
version yields a sequence of intervals of uncertainty whose widths tend to zero faster 
than that which would be obtained by other methods. 
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Fig. 8.2 Fibonacci search 



The solution to the Fibonacci difference equation 

Fn = Fn- i + Fn-2 

is of the form 

F n =At^ + Bt%, 

where t\ and T2 are roots of the characteristic equation 

T 2 = T + 1 . 

Explicitly, 

1 + V5 1 - V5 



(8.3) 

(8.4) 
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(The number T\ - 1 .61 8 is known as the golden section ratio and was considered by 
early Greeks to be the most aesthetic value for the ratio of two adjacent sides of a 
rectangle.) For large N the first term on the right side of (8.4) dominates the second, 
and hence 

lim = — ~ 0.618. 

N^oo F N T i 

It follows from (8.1) that the interval of uncertainty at any point in the process has 
width 




and from this it follows that 

% - = — = 0.618. (8.6) 
dk ri 

Therefore, we conclude that, with respect to the width of the uncertainty interval, the 
search by golden section converges linearly (see Sect. 7.8) to the overall minimum 
of the function / with convergence ratio 1 /ri = 0.618. 

The Fibonacci search method has a certain amount of theoretical appeal, since it 
assumes only that the function being searched is unimodal and with respect to this 
broad class of functions the method is, in some sense, optimal. In most problems, 
however, it can be safely assumed that the function being searched, as well as being 
unimodal, possesses a certain degree of smoothness, and one might, therefore, ex- 
pect that more efficient search techniques exploiting this smoothness can be devised; 
and indeed they can. Techniques of this nature are usually based on curve fitting pro- 
cedures where a smooth curve is passed through the previously measured points in 
order to determine an estimate of the minimum point. A variety of such techniques 
can be devised depending on whether or not derivatives of the function as well as the 
values can be measured, how many previous points are used to determine the fit, and 
the criterion used to determine the fit. In this section a number of possibilities are 
outlined and analyzed. All of them have orders of convergence greater than unity. 



Quadratic Fit 



The scheme that is often most useful in line searching is that of fitting a quadratic 
through three given points. This has the advantage of not requiring any deriva- 
tive information. Given x\, X2, X3 and corresponding values f(x 1 ) = / 1 , f(x 2 ) = 
/ 2 , f (aO = h we construct the quadratic passing through these points 



3 



i= 1 



Ujtiix - Xj) 

n jfiixi-xjY 



(8.7) 
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and determine a new point X 4 as the point where the derivative of q vanishes. Thus 

1 £23/l + £>31/2 + £l2/3 

X4 — — — , to-oj 

2 <223/1 + <231/2 + <212/3 
where a t j - Xi - Xj, bij - x 2 t - x 2 . 

Define the errors £\ = x* - x i9 i = 1,2, 3,4. The expression for £4 must be a 
polynomial in s\, £ 2 , £ 3 . It must be second order (since it is a quadratic fit). It must 
go to zero if any two of the errors £\, £ 2 , £3 is zero. (The reader should check this.) 
Finally, it must be symmetric (since the order of points is relevant). It follows that 
near a minimum point v* of /, the errors are related approximately by 



£4 = M(£ i £ 2 + £ 2 S 3 + £ 1 ^ 3 ), (8.9) 

where M depends on the values of the second and third derivatives of / at x* . 

If we assume that £k —> 0 with an order greater than unity, then for large k the 
error is governed approximately by 



Gk+ 2 = M£k£k- 1 . 



Letting = log M£k this becomes 

yk+2 =yk + yk-\ 

with characteristic equation 

A 3 - A -1=0. 

The largest root of this equation is A ^ 1.3 which thus determines the rate of growth 
of yk and is the order of convergence of the quadratic fit method. 



1 st-Order Method: Curve Fitting and Methods of False Position 



In this section a number fitting methods using the first derivative information are 
described. All of them have orders of convergence greater than unity. 



Quadratic Fit: Method of False Position 



Suppose that at two points Xk and Xk - 1 where measurements f(xk ), f\xk ), f'(xk-i) 
are available, it is possible to fit the quadratic 



q(x) = f(x k ) + f (x k )(x - x k ) + 



(x - x k ) 2 



%k — 1 %k 



2 




8.1 Line Search Algorithms 



219 



which has the same corresponding values. An estimate Xk+\ can then be determined 
by finding the point where the derivative of q vanishes; thus 



Xk+1 =X k - f'(x k ) 



Xk— 1 %k 

f'(Xk-l) ~ f'(Xk) ' 



( 8 . 10 ) 



(See Fig. 8.3.) Comparing this formula with Newton’s method, we see again that 
the value f(xk ) does not enter; hence, our fit could have been passed through either 
f(xk) or f(xk- 1 ). Also the formula can be regarded as an approximation to New- 
ton’s method where the second derivative is replaced by the difference of two first 
derivatives. 



Fig. 8.3 False 




Again, since this method does not depend on values of / directly, it can be 
regarded as a method for solving f(x) = g(x) = 0. Viewed in this way the method, 
which is illustrated in Fig. 8.4, takes the form 



Xk + 1 =X k - g(x k ) 



Xk Xk - 1 
g(Xk) - g(x k -\) ' 



( 8 . 11 ) 



We next investigate the order of convergence of the method of false position and 
discover that it is order r\ - 1.618, the golden mean. 

Proposition. Let g have a continuous second derivative and suppose x* is such that g(x*) = 

0, g'(x*) ^ 0. Then for xq sufficiently close to x*, the sequence {v^}^ 0 generated by the 
method of false position (8.11) converges to x * with order T\ - 1.618. 



Proof. Introducing the notation 



g[a, b ] = 



g(b) ~ g(a) 

b - a 



( 8 . 12 ) 
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Fig. 8.4 False position for solving equations 



we have 



Xk - 1 -X* = x k -x* - g(Xj t) 
= (x k - X*) 



%k - Xk - 1 



g(xi) - g(x*-l) 

g[Xk-\,Xk]-g[Xk,X*] 
g[Xk- l,X k \ 



Further, upon the introduction of the notation 

g[a, b ] - g[b, c ] 



g[a, c] = 



a - c 



we may write (8.13) as 

Xk +1 -X* = (x k - x*){x k -x - X *) 



g\Xk- l,Xk,X* 



(8.13) 



gUk- l,X k ] 

Now, by the mean value theorem with remainder, we have (see Exercise 2) 



glx k -i,x k ]=g , (£k) (8-14) 

and 

g[x k - 1 , x,, x’] = \g"{Tik), (8.15) 

where & and T]k are convex combinations of x&, Xk~\ and x^, x*_i, x*, respectively. 
Thus 

Xk + 1 - X* = £M(* - x*)(x t _! - X*). (8.16) 

2g'(&) 

It follows immediately that the process converges if it is started sufficiently close 
to x*. 

To determine the order of convergence, we note that for large k Eq. (8.16) be- 
comes approximately 
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Xk+ 1 - X* = M(x k - x*)(x k -i - x*), 



where 



g"(** 

2 £'(**) 



Thus defining s k = ( x k - /) we have, in the limit, 



£k + 1 = Msk&k-i- 



(8.17) 



Taking the logarithm of this equation we have, with y k = log Ms k , 

yk + 1 =yk + yk-u (8.18) 



which is the Fibonacci difference equation discussed in Sect. 7.1. A solution to this 
equation will satisfy 



Thus 



yk+i ~ T\y k -> o. 



log Ms k+] -n log Ms k 



0 or log 



Ms k+ 1 

(Me k Y' 



0 , 



and hence 



e k 



Having derived the error formula (8.17) by direct analysis, it is now appropriate 
to point out a short-cut technique, based on symmetry and other considerations, 
that can sometimes be used in even more complicated situations. The right side of 
error formula (8.17) must be a polynomial in s k and s k -u since it is derived from 
approximations based on Taylor’s theorem. Furthermore, it must be second order, 
since the method reduces to Newton’s method when x k = x k -\. Also, it must go 
to zero if either s k or s k -\ go to zero, since the method clearly yields e k+ \ = 0 in 
that case. Finally, it must be symmetric in s k and s k -\, since the order of points is 
irrelevant. The only formula satisfying these requirements is s k+ \ = Ms k s k -\. 



Cubic Fit 



Given the points x k -\ and x k together with the values f(x k - 1 ), f(x k - 1 ), f(x k ), f'(x k ), 
it is also possible to fit a cubic equation to the points having corresponding values. 
The next point x k+ \ can then be determined as the relative minimum point of this 
cubic. This leads to 



%k + 1 — %k (%k %k— l) 



f'(x k ) + U 2 -U\ 

f'(x k ) - f'(x k -i) + 2u 2 \ 



(8.19) 



where 



u\ = f(x k -\) + f(x k ) - 3 



f(x k - 1 ) - f(x k ) 



U 2 = [ul-f(x k - 1 )f( Xk )] 1/2 , 



X k — 1 x k 
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which is easily implementable for computations. 

It can be shown (see Exercise 3) that the order of convergence of the cubic fit 
method is 2.0. Thus, although the method is exact for cubic functions indicating 
that its order might be three, its order is actually only two. 



2nd- Order Method: Newton’s Method 

Suppose that the function / of a single variable v is to be minimized, and suppose 
that at a point Xk where a measurement is made it is possible to evaluate the three 
numbers f(xk), f'(xk), f"(xk )• It is then possible to construct a quadratic function 
q which at Xk agrees with / up to second derivatives, that is 

q(x) = f(x k ) + f'(x k )(x - x k ) + ] -f"(x k )(x - x k ) 2 . (8.20) 

We may then calculate an estimate Xk + 1 of the minimum point of / by finding the 
point where the derivative of q vanishes. Thus setting 

o = q'(x k+ 1 ) = f(x k ) + f"(x k )(x k+ 1 - x k ). 




Fig. 8.5 Newton’s method for minimization 



we find 

f(Xj t) 

Xk+i — Xk • ( 8 . 21 ) 

f (Xk) 

This process, which is illustrated in Fig. 8.5, can then be repeated at Xk+\. 

We note immediately that the new point Xk+\ resulting from Newton’s method 
does not depend on the value f{xk ). The method can more simply be viewed as a 
technique for iteratively solving equations of the form 



8(x) = 0, 
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where, when applied to minimization, we put g(x) = f(x). In this notation Newton’s 
method takes the form 

g(xk) 

%k + 1 — %k x • (8.22) 

g'(Xk) 

This form is illustrated in Fig. 8.6. 

We now show that Newton’s method has order two convergence: 

Proposition. Let the function g have a continuous second derivative, and let x* satisfy 
g(x*) = 0, g'(x*) ^ 0. Then, provided x$ is sufficiently close to x*, the sequence {v^}^ 0 
generated by Newton’s method (8.22) converges to x* with an order of convergence at least 
two. 

Proof. For points f in a region near x* there is a k\ such that \g"(f)\ < k\ and a k^ 
such that \g'(f)\ > k 2. Then since g(x*) = 0 we can write 



Xk+l ~ x* 



, g(Xk) - g(x ) 

Xk-X ; 

g (Xk) 

-Vg(x k ) - g(x *) + g\x k )(x* - x k )]/g\x k ). 




Fig. 8.6 Newton’s method for solving equations 



The term in brackets is, by Taylor’s theorem, zero to first-order. In fact, using the 
remainder term in a Taylor series expansion about Xk, we obtain 



Xk+l ~ x* 



1 g"(& 

2 g'(x k ) 



(Xk - X*) 2 



for some f between x* and Xk. Thus in the region near x*. 



\x k +l — X*\ < ^-\x k -X*\ 2 . 
2^2 



We see that if | Xk - x*\k\/2k2 < 1, then \xk+i - x*\ < \xk - x*\ and thus we conclude 
that if started close enough to the solution, the method will converge to x* with an 
order of convergence at least two. I 
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Global Convergence of Curve Fitting 



Above, we analyzed the convergence of various curve fitting procedures in the 
neighborhood of the solution point. If, however, any of these procedures were 
applied in pure form to search a line for a minimum, there is the danger — alas, 
the most likely possibility — that the process would diverge or wander about mean- 
inglessly. In other words, the process may never get close enough to the solution for 
our detailed local convergence analysis to be applicable. It is therefore important to 
artfully combine our knowledge of the local behavior with conditions guaranteeing 
global convergence to yield a workable and effective procedure. 

The key to guaranteeing global convergence is the Global Convergence Theorem 
of Chap. 7. Application of this theorem in turn hinges on the construction of a suit- 
able descent function and minor modifications of a pure curve fitting algorithm. We 
offer below a particular blend of this kind of construction and analysis, taking as 
departure point the quadratic fit procedure discussed in Sect. 8.1 above. 

Let us assume that the function / that we wish to minimize is strictly unimodal 
and has continuous second partial derivatives. We initiate our search procedure by 
searching along the line until we find three points x \ 9 X 2 , x 3 with x\ < x 2 < x 3 such 
that f(x i) ^ f(x 2 ) < f(x 3 ). In other words, the value at the middle of these three 
points is less than that at either end. Such a sequence of points can be determined in 
a number of ways — see Exercise 7. 

The main reason for using points having this pattern is that a quadratic fit to these 
points will have a minimum (rather than a maximum) and the minimum point will 
lie in the interval [jci, x 3 ]. See Fig. 8.7. We modify the pure quadratic fit algorithm 
so that it always works with points in this basic three-point pattern. 

The point X 4 is calculated from the quadratic fit in the standard way and f(x 4 ) 
is measured. Assuming (as in the figure) that X 2 < X 4 < X 3 , and accounting for the 
unimodal nature of /, there are but two possibilities: 

1. f(x 4 ) < f(x 2 ) 

2. f(x 2 ) < f(x 4 ) < f(x 3 ). 

In either case a new three-point pattern, x\,x 2 , x 3 , involving X 4 and two of the old 
points, can be determined: In case (8.1) it is 



(xi,x 2 ,x 3 ) = (x 2 , x 4 , x 3 ). 



while in case ( 8 . 2 ) it is 

(xux 2 ,x 3 ) = (x u x 2 , X4). 

We then use this three-point pattern to fit another quadratic and continue. The pure 
quadratic fit procedure determines the next point from the current point and the 
previous two points. In the modification above, the next point is determined from 
the current point and the two out of three last points that form a three-point pattern 
with it. This simple modification leads to global convergence. 
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To prove convergence, we note that each three-point pattern can be thought of as 
defining a vector x in E 3 . Corresponding to an x = (jci, X2, *3) such that (x\ 9 X2, X3) 
form a three-point pattern with respect to /, we define A(x) = (xi,X2, X3) as dis- 
cussed above. For completeness we must consider the case where two or more 
of the Xi, i = 1 , 2 , 3 are equal, since this may occur. The appropriate defini- 
tions are simply limiting cases of the earlier ones. For example, if x\ = *2, then 
(jti, Jt2, X3) form a three-point pattern if f(x 2) < f(x 3) and f'(x 2) < 0 (which 
is the limiting case of f(x 2) < f(x 1)). A quadratic is fit in this case by using the 
values at the two distinct points and the derivative at the duplicated point. In case 
x\ = X2 = X3, (x\, X2, V3)forms a three-point pattern if f(x 2) = 0 and/" (*2) > 0 . 
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x, x 2 x 3 



Fig. 8.7 Three-point pattern 



With these definitions, the map A is well defined. It is also continuous, since curve 
fitting depends continuously on the data. 

We next define the solution set F c E 3 as the points x* = (x* 9 x* 9 x*) where 

/'(**) = 0. 

Finally, we let Z(x) = f(x 1) + f(x 2) + f(x 3). It is easy to see that Z is a descent 
function for A. After application of A one of the values f(x 1), f(x 2), f(x 3) will be 
replaced by f(x 4), and by construction, and the assumption that / is unimodal, it will 
replace a strictly larger value. Of course, at x* = (jc*, x* 9 x*) we have A(x*) = x* 
and hence Z(A(x*)) = Z(x*). 

Since all points are contained in the initial interval, we have all the requirements 
for the Global Convergence Theorem. Thus the process converges to the solution. 
The order of convergence may not be destroyed by this modification, if near the 
solution the three-point pattern is always formed from the previous three points. In 
this case we would still have convergence of order 1 . 3 . This cannot be guaranteed, 
however. 

It has often been implicitly suggested, and accepted, that when using the quadratic 
fit technique one should require 



/(**+ 1) < /(**) 
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so as to guarantee convergence. If the inequality is not satisfied at some cycle, then a 
special local search is used to find a better Xk+\ that does satisfy it. This philosophy 
amounts to taking Z(x) = f(x 3 ) in our general framework and, unfortunately, this 
is not a descent function even for unimodal functions, and hence the special local 
search is likely to be necessary several times. It is true, of course, that a similar 
special local search may, occasionally, be required for the technique we suggest in 
regions of multiple minima, but it is never required in a unimodal region. 

The above construction, based on the pure quadratic fit technique, can be emu- 
lated to produce effective procedures based on other curve fitting techniques. For 
application to smooth functions these techniques seem to be the best available in 
terms of flexibility to accommodate as much derivative information as is available, 
fast convergence, and a guarantee of global convergence. 



*Closedness of Line Search Algorithms 



Since searching along a line for a minimum point is a component part of most non- 
linear programming algorithms, it is desirable to establish at once that this pro- 
cedure is closed; that is, that the end product of the iterative procedures outlined 
above, when viewed as a single algorithmic step finding a minimum along a line, 
define closed algorithms. That is the objective of this section. 

To initiate a line search with respect to a function /, two vectors must be spec- 
ified: the initial point x and the direction d in which the search is to be made. The 
result of the search is a new point. Thus we define the search algorithm S as a 
mapping from E 2n to E n . 

We assume that the search is to be made over the semi-infinite line emanating 
from x in the direction d. We also assume, for simplicity, that the search is not made 
in vain; that is, we assume that there is a minimum point along the line. This will 
be the case, for instance, if / is continuous and increases without bound as x tends 
toward infinity. 

Definition. The mapping S : E ln — » E n is defined by 

S(x, d) = {y : y = x + ad for some a > 0, /( y) = min /(x + ad)}. (8.23) 



In some cases there may be many vectors y yielding the minimum, so S is a set- 
valued mapping. We must verify that S is closed. 

Theorem. Let f be continuous on E n . Then the mapping defined by (8.23) is closed at (x, d ) 
if d * 0. 

Proof. Suppose {x^} and {d^} are sequences with x^ — > x, dk — > d ^ 0. Suppose 
also that y^ e S(x*, d&) and that y^ — > y. We must show that y e S(x, d). 

For each k we have y^ = x^ + o^d^ for some or*. From this we may write 



ly* - x - 



|d*l 
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Taking the limit of the right-hand side of the above, we see that 

ly-x| 



ak —> a = 



|d| 



It then follows that y = x + ad. It still remains to be shown that y € S(x, d). 
For each k and each a , 0 < a < oo, 

f(y k ) < /(x* + ad k ). 



Letting k —> oo we obtain 

/( y) < /(x + ad). 

Thus 

/( y) < min f(x + ad), 

0<Q'<oo 

and hence y e S(x, d). I 

The requirement that d A 0 is natural both theoretically and practically. From 
a practical point of view this condition implies that, when constructing algorithms, 
the choice d = 0 had better occur only in the solution set; but it is clear that if d = 0 , 
no search will be made. Theoretically, the map S can fail to be closed at d = 0, as 
illustrated below. 

Example. On E l define f(x) = (x - l) 2 . Then 5 (x, d) is not closed at v = 0, d = 0. 
To see this we note that for any d > 0 



min f(ad) = /( 1), 

0<Q'<OO 



and hence 
but 



5(0, d) = 1; 
min f(a ■ 0) = /( 0) 

0<Q'<oo 



so that 
Thus as d 



5(0, 0) = 0. 
0, 5(0, d)-»S( 0, 0). 



Inaccurate Line Search 

In practice, of course, it is impossible to obtain the exact minimum point called 
for by the ideal line search algorithm S described above. As a matter of fact, it is 
often desirable to sacrifice accuracy in the line search routine in order to conserve 
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overall computation time. Because of these factors we must, to be realistic, be cer- 
tain, at every stage of development, that our theory does not crumble if inaccurate 
line searches are introduced. 

Inaccuracy generally is introduced in a line search algorithm by simply terminat- 
ing the search procedure before it has converged. The exact nature of the inaccu- 
racy introduced may therefore depend on the particular search technique employed 
and the criterion used for terminating the search. We cannot develop a theory that 
simultaneously covers every important version of inaccuracy without seriously de- 
tracting from the underlying simplicity of the algorithms discussed later. For this 
reason our general approach, which is admittedly more free-wheeling in spirit than 
necessary but thereby more transparent and less encumbered than a detailed account 
of inaccuracy, will be to analyze algorithms as if an accurate line search were 
made at every step, and then point out in side remarks and exercises the effect of 
inaccuracy. 



Armijo’s Rule 

A practical and popular criterion for terminating a line search is Armijo’s rule. The 
essential idea is that the rule should first guarantee that the selected a is not too 
large, and next it should not be too small. Let us define the function 

(p(a) = f(x k + ad k ). 

Armijo’s rule is implemented by consideration of the function 0(0) + £(p'(0)a for 
fixed £, 0 < s < 1. This function is shown in Fig. 8.8a as the dashed line. A value 
of a is considered to be not too large if the corresponding function value lies below 
the dashed line; that is, if 

cp(a) < 0(0) + £0'(O)of. (8.24) 

To insure that a is not too small, a value rj > 1 is selected, and a is then considered 
to be not too small if 

(p(rja) > 0(0) + £0'(O )rja. 

This means that if a is increased by the factor 77 , it will fail to meet the test (8.24). 
The acceptable region defined by the Armijo rule is shown in Fig. 8.8a when 77 = 2 
(there are also other rules can be adapted). 

Sometimes in practice, the Armijo test is used to define a simplified line search 
technique that does not employ curve fitting methods. One begins with an arbitrary a. 
If it satisfies (8.24), it is repeatedly increased by 77(77 = 2 or rj = 10 and £ = .2 are 
often used) until (8.24) is not satisfied, and then the penultimate a is selected. If, on 
the other hand, the original a does not satisfy (8.24), it is repeatedly divided by 77 
until the resulting a does satisfy (8.24). 
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One of the oldest and most widely known methods for minimizing a function 
of several variables is the method of steepest descent (often referred to as the 
gradient method). The method is extremely important from a theoretical view- 
point, since it is one of the simplest for which a satisfactory analysis exists. More 
advanced algorithms are often motivated by an attempt to modify the basic steep- 
est descent technique in such a way that the new algorithm will have superior 
convergence properties. The method of steepest descent remains, therefore, not only 
the technique most often first tried on a new problem but also the standard of ref- 
erence against which other techniques are measured. The principles used for its 
analysis will be used throughout this book. 



The Method 

Let / have continuous first partial derivatives on E n . We will frequently have need 
for the gradient vector of / and therefore we introduce some simplifying notation. 
The gradient V/(x) is, according to our conventions, defined as a ^-dimensional row 
vector. For convenience we define the ^-dimensional column vector g(x) = V/(x) r . 
When there is no chance for ambiguity, we sometimes suppress the argument x and, 
for example, write gk for g(x^) = V/(x^) r . 

The method of steepest descent is defined by the iterative algorithm 



^k+ 1 _ Xk 



where stepsize ctk is a nonnegative scalar possibly minimizing f(xk~ag k ). In words, 
from the point x^ we search along the direction of the negative gradient -g k to a 
minimum point on this line; this minimum point is taken to be x^+i . 

In formal terms, the overall algorithm A : E n ^ E n which gives x^+i e A(x^) 
can be decomposed in the form A = SG. Here G : E n —> E 2n is defined by G(x) = 
(x, -g(x)), giving the initial point and direction of a line search. This is followed by 
the line search S : E 2n — > E n defined in Sect. 8.1. 



Global Convergence and Convergence Speed 

It was shown in Sect. 8.1 that S is closed if V/(x) ^ 0, and it is clear that G is 
continuous. Therefore, by Corollary 2 in Sect. 7.7 A is closed. 

We define the solution set to be the points x where V/(x) = 0. Then Z(x) = /(x) 
is a descent function for A, since for V/(x) + 0 

lim f(x - org(x)) < f(x). 

0<ar<oo 
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Fig. 8.8 Stopping rules, (a) Armijo rule, (b) Golden test, (c) Wolfe test 



Thus by the Global Convergence Theorem, if the sequence {x^} is bounded, it will 
have limit points and each of these is a solution. What about the convergence speed? 
Assume that /(x) is convex and differentiable everywhere, admits a minimizer x*, 
and satisfies the (first-order) /3-Lipschitz condition, that is, for any two points x and y 

|V/(x) - V/(y)| < (3\x - y| 

for a positive real number / 3 . Starting from any point x 0 , we consider the method of 
steepest descent with a fixed step size | for all k\ 
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1 1 T 
x *+ 1 = x* - -g* = x* - -V/(Xjfc) . 



(8.25) 



We first prove a lemma. 

Lemma 1. /(x) differentiable everywhere and satisfy the ( first-order ) (3 -Lips chit z 

condition. Then, for any two points x and y 

m - f(y) - V/OOCr -j>>) < ^1* -yl 2 - 



Then we prove 



Theorem 1 (Steepest Descent — Lipschitz Convex Case). Let fix) be convex and differen- 
tiable everywhere, satisfy the ( first-order ) fi - Lips chit z condition, and admit a minimizer x*. 
77z£?z, f/ze method of steepest descent (8.25) generates a sequence of solutions x^ such that 



|V/(x*)l < 



l 2 

ffkikTTj 



l*o ~ x *l 



and 

f{X k ) - /(**) < T~~~~ ko-X* I 2 . 

2(fc + 1) 



Proof. Consider the function g*(y) = /(y) - V/(x)y for any given x. Note that is 
also convex and satisfies the /3-Lipschitz condition. Moreover, x is the minimizer of 
gx(y) and Vg x (y) = V/(y) - V/(x). 

Applying Lemma 1 to and noting the relations of g x and /(x), we have 



/(X) - /(y) - v/(x)(x - y) = g*(x) - gjcCy) 

< g*(y - jjVgxC y)) - gx(y) 

< V^(y)(-|V^(y) r ) + f pV^(y)| 2 (8 . 2 6) 

= -^IV^(y)l 2 

= -^|V/(x)-V/(y)| 2 . 

Similarly, we have 

f(y) - /(x) - V/( y)(y - x) < -L|V/(x) - V/(y)| 2 . 

Zp 

Adding the above two derived inequalities, we have for any x and y: 

(V/(x) - V/(y))(x - y) > U V/(x) - V/(y)| 2 . (8.27) 



For simplification, in what follows let = x^ - x* and Sk = [/(x^) - /(x*)] > 0. 
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Now let x = Xk+i and y = x^ in (8.27). Then 

-^(&) r (&+i - Sk. ) = (*n-i - xt) r (g*+i - &) ^ 2| gjt+ i - g(t | 2 , 
which leads to 



|g i+1 | 2 < (g k+ 0 T g k < Igt+iligtl, that is |g fc+ il<|gi|. (8.28) 

Inequality (8.28) implies that \g k \ = \Vf(x k )\ is monotonically decreasing. 

Applying inequality (8.26) for x = x k and y = x* and noting g* = 0 we have 

S k < (g k ) T d k - ±\g k \ 2 

= -j3(x k+ 1 - x k )d k - f Ixt+i - x^l 2 

= — f (|X *+1 - Xil 2 + 2(x k+ i - x k ) T d k ) (8.29) 

= -f (|d* +1 - d,| 2 + 2(d k+ \ - d k fd k ) 

= £(\d k \ 2 -\d k+l \ 2 ). 

Summing up (8.29) from 0 to k , we have 

Jc 

y 5, < f (Idol 2 - |d, +1 | 2 ) < f Idol 2 . (8.30) 

1=0 

Using (8.26) again for x = x^+i and y = x^ and noting (8.25) we have 

Sk+i -S k = f(x k+ 1 ) - f(x k ) 

< gfc+i(— ^g*) - ^Igft+i - 8il 2 (8.31) 

= -^(l&+il 2 + l&l 2 )- 

Noting (8.31) holts for all k, we have 

2io^ = 2ioW+i-o 
= zi 0 w+i)-2io^ 

= 2 yW-ZtiW 
= 4(£+l) + Zii(<S/-i-<5/)( 

> 6 k (k + 1) + Zf=i ^(lg;l 2 + lg;-il 2 ) 

>*(*+l)+ *g%| 2 

where the last inequality comes \g k \ = |V/(x^)| is monotonically decreasing. 

Using (8.30) we finally have 

k(k +1) ~ B o 

(k + l)6 k + -^-^|g*| 2 < ^|d 0 | 2 . (8.32) 

Inequality (8.32), from Sk = /(x&) - /(x*) > 0 and do = xo - x*, proves the desired 
bounds. I 

Theorem 1 implies that the convergence speed of the steepest descent method is 
arithmetic. 
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The Quadratic Case 



When /(x) is strongly convex, the convergence speed can be increased from arith- 
metic to geometric or linear convergence. Since all of the important convergence 
characteristics of the method of steepest descent are revealed by an investigation of 
the method when applied to quadratic problems, we focus here on 

/(x) = ^x r Qx - x r b, (8.33) 

where Q is a positive definite symmetric nxn matrix. Since Q is positive definite, 
all of its eigenvalues are positive. We assume that these eigenvalues are ordered: 0 < 
a = X\ < A 2 . . . < A n = A. With Q positive definite, it follows (from Proposition 5, 
Sect. 7.4) that / is strictly convex. 

The unique minimum point of / can be found directly, by setting the gradient to 



zero, as the vector x* satisfying 




Qx* = b. 


(8.34) 


Moreover, introducing the function 




E(x) = f (x - x*) r Q(x - x*), 


(8.35) 



we have E(x) = /(x) + (l/2)x* r Qx*, which shows that the function E differs from 
/ only by a constant. For many purposes then, it will be convenient to consider that 
we are minimizing E rather than /. 

The gradient (of both / and E) is given explicitly by 

g(x) = Qx - b. (8.36) 



Thus the method of steepest descent can be expressed as 



X-k+l — 



(8.37) 



where = Qx k - b and where ak minimizes f(xk - ag k )- We can, however, in this 
special case, determine the value of ak explicitly. We have, by definition (8.33), 

f(x k - ag k ) = ~(x£ - ag k ) r Q(x k - ag k ) - (x* - ag k ) T b, 
which (as can be found by differentiating with respect to a) is minimized at 



a k 



gjfcg k 

g[Qg k 



(8.38) 



Hence the method of steepest descent (8.37) takes the explicit form 



x k+ i =x k - 



g[Qg* 



g*> 



(8.39) 



where g* = Qx* - b. 
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The function / and the steepest descent process can be illustrated as in Fig. 8.9 by 
showing contours of constant values of / and a typical sequence developed by the 
process. The contours of / are ^-dimensional ellipsoids with axes in the directions 
of the ^-mutually orthogonal eigenvectors of Q. The axis corresponding to the ith 
eigenvector has length proportional to 1 /T;. We now analyze this process and show 
that the rate of convergence depends on the ratio of the lengths of the axes of the 
elliptical contours of /, that is, on the eccentricity of the ellipsoids. 




Fig. 8.9 Steepest descent 



Lemma 2. The iterative process (8.39) satisfies 



E(X k+ 1) = 



(gigk) 2 



(g T k Qg k )(slQ 'g k ) 



E(x k ). 



( 8 . 40 ) 



Proof. The proof is by direct computation. We have, setting y k = x k - x*, 
E(x k ) - E(x k+ 1 ) 2a k g T k Qy, - a 2 k g T k Qg, 



E(x k ) 



y[Qy/.- 



Using g k = Qy k we have 



2(g[g t ) 2 _ (gfgt) 2 

Ejxj^EQ^) = (g[Qg t ) (g[Qg t ) 

E ^k) ~ g[Q _1 gr 

= (g[g kf | 

(g[Qgr)(g[Q _1 gr)’ 



In order to obtain a bound on the rate of convergence, we need a bound on the right- 
hand side of (8.40). The best bound is due to Kantorovich and his lemma, stated 
below, is a useful general tool in convergence analysis. 
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Kantorovich inequality. Let Q be a positive definite symmetric nXn matrix. For any vector 
x there holds 

(X ' X)2 , > (8.41) 

(x r Qx)(x r Q _1 x) ( a+A ) 2 

where a and A are, respectively, the smallest and largest eigenvalues of Q. 

Proof. Let the eigenvalues A \ , T 2 , . . . , A n of Q satisfy 
0 < a = A\ A 2 . . . ^ A n = A. 

By an appropriate change of coordinates the matrix Q becomes diagonal with diag- 
onal (A\, A 2 , . . . , A n ). In this coordinate system we have 

(x r x) 2 _ (Z" =1 *?) 2 

(x r Qx)(x r Q -1 x) (Z"=i Xxf)CL1= i(xf/X)Y 

which can be written as 

(x r x) 2 i/au 

(x 7 ’Qx)(x r Q _1 x) Z" = i (6M) WY 

where = x 2 / Xj'L i A f ■ We have converted the expression to the ratio of two func- 
tions involving convex combinations; one a combination of Af s; the other a com- 
bination of lAVs. The situation is shown pictorially in Fig. 8.10. The curve in the 
figure represents the function l /A. Since YH= 1 Ok is a point between A\ and A n , the 
value of is a point on the curve. On the other hand, the value of 0(f) is a convex 
combination of points on the curve and its value corresponds to a point in the shaded 
region. For the same vector f both functions are represented by points on the same 
vertical line. The minimum value of this ratio is achieved for some A = f\A\ + f n A n , 
with fi+f n = l. Using the relation f x /Ai + f n /A n = (A x + A n - f x A\ - f n A n )/AiA n , 
an appropriate bound is 



m 

HO 



> lim 



(1/d) 



(Ai + A n - A)l(AxA n ) 



The minimum is achieved at A = (A\ + A n )/ 2, yielding 



0(0 > 4AjA n | 

HO ' Hi +A n ) 2 ' 

Combining the above two lemmas, we obtain the central result on the convergence 
of the method of steepest descent. 

Theorem 2 (Steepest Descent — Quadratic Case). For any x 0 e E n the method of steepest 
descent (8.39) converges to the unique minimum point x* of f Furthermore, with E(x) = 

\(x - x*) T Q(x - Jt*), there holds at every step k 

(A - a\ 2 

E(x k+l )^ — — E(x k ). 

\A + aJ 



(8.42) 
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Proof. By Lemma 2 and the Kantorovich inequality 




E(Hk) = P — -) E(x k ). 
\A + a) 



\ 



\ 



\ 




X 






Fig. 8.10 Kantorovich inequality 

It follows immediately that E(xk) — > 0 and hence, since Q is positive definite, that 



Roughly speaking, the above theorem says that the convergence rate of steepest 
descent is slowed as the contours of / become more eccentric. If a = A, correspond- 
ing to circular contours, convergence occurs in a single step. Note, however, that 
even if n - 1 of the n eigenvalues are equal and the remaining one is a great distance 
from these, convergence will be slow, and hence a single abnormal eigenvalue can 
destroy the effectiveness of steepest descent. 

In the terminology introduced in Sect. 7.8, the above theorem states that with 
respect to the error function E (or equivalently f) the method of steepest descent 
converges linearly with a ratio no greater than [(A - a )/ (A + a)] 2 . The actual rate 
depends on the initial point xo. However, for some initial points the bound is actually 
achieved. Furthermore, it has been shown by Akaike that, if the ratio is unfavorable, 
the process is very likely to converge at a rate close to the bound. Thus, somewhat 
loosely but with reasonable justification, we say that the convergence ratio of steep- 
est descent is [(A - a) /(A + a)] 2 . 

It should be noted that the convergence rate actually depends only on the ratio 
r - Aja of the largest to the smallest eigenvalue. Thus the convergence ratio is 




